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Abstract 

We define and study the Functional Aggregate Query (FAQ) problem, which encompasses many 
frequently asked questions in constraint satisfaction, databases, matrix operations, probabilistic graphical 
models and logic. This is our main conceptual contribution. 

We then present a simple algorithm called InsideOut to solve this general problem. InsideOut is 
a variation of the traditional dynamic programming approach for constraint programming based on 
variable elimination. Our variation adds a couple of simple twists to basic variable elimination in order 
to deal with the generality of FAQ, to take full advantage of Grohe and Marx’s fractional edge cover 
framework, and of the analysis of recent worst-case optimal relational join algorithms. 

As is the case with constraint programming and graphical model inference, to make InsideOut run 
efficiently we need to solve an optimization problem to compute an appropriate variable ordering. The 
main technical contribution of this work is a precise characterization of when a variable ordering is 
‘semantically equivalent’ to the variable ordering given by the input FAQ expression. Then, we design an 
approximation algorithm to find an equivalent variable ordering that has the best ‘fractional FAQ-width’. 
Our results imply a host of known and a few new results in graphical model inference, matrix operations, 
relational joins, and logic. 

We also briefly explain how recent algorithms on beyond worst-case analysis for joins and those for 
solving SAT and #SAT can be viewed as variable elimination to solve FAQ over compactly represented 
input functions. 
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1 Introduction 


1.1 Motivating examples 

The following fundamental problems from three diverse domains share a common algebraic structure. 

Example 1.1. (Matrix Chain Multiplication (MCM)) Given a series of matrices Ai,..., A„ over some field F, 
where the dimension of A,; is pi xpi + \, i £ [n], we wish to compute the product A = Ai • • • A„. The problem 
can be reformulated as follows. There are n + 1 variables X\,... ,X n+ \ with domains Dom(Xj) = [p*], for 
i £ [n + 1]. For i £ [n], matrix A, can be viewed as a function of two variables 

V’i.i+i : Dom(Aj) x Dom(A i+1 ) F, 

where y) = (A ,) xy . The MCM problem is to compute the output function 

n 

ip(xi,x n+ i)= ^2 ••• ^2 

ai2^Dom(X2) x n GDom(X n ) i= 1 


Example 1.2. (Maximum A Posteriori (MAP) queries in probabilistic graphical models (PGM)) Consider 
a discrete graphical model represented by a hypergraph H = (V,£). There are n discrete random variables 
V = {Xi ,..., X n } on finite domains Dom(Xi), i £ [n], and m = \£\ factors 

ips ■ n Dom(Xi) —> K + , S £ £. 

ies 

A typical inference task is to compute the marginal MAP estimates, written in the form 


p(x i,..., Xf) = max 

i /+ i6Dom(A' /+1 ) 


max 

x n E Dom(X n ) 


V’s(xs). 

S&£ 


Example 1.3. (ff Quantified Conjunctive Query (fpQCQ)) Let d> be a first-order formula of the form 


$(A : 1 ,...,X f ) = Q f+1 X f+1 ---Q n X n \ /\ R 

yllG atoms(‘I>) 

where Qi £ {3,V}, for i > /. The ffQCQ problem is to count the number of tuples in relation $ on the 
free variables X lt ... ,Xf. To reformulate #QCQ, construct a hypergraph H = (V,£) as follows: V is the 
set of all variables X \,..., X n , and for each R £ atoms(d)) there is a hyperedge S = vars (R) consisting of all 
variables in R. The atom R can be viewed as a function indicating whether an assignment xg to its variables 
is satisfied by the atom; namely ipspx-s) = 1 if R(*s) is true and 0 otherwise. 

Now, for each i £ {/ + 1 ,..., n} we define an aggregate operator 

£jv(i) _ f max if Qi = 3, 

W “ \ x if Qi = V. 

Then, the ffQCQ problem above is to compute the constant function 

</?= j2 "■ yi 0 a+i) ••• 0 (n) 

XiGDomfXi) x f GDom(X f ) ^Z+lEfO: 1 } ®ne{0,l} Sg£ 

It turns out that these and dozens of other fundamental problems from constraint satisfaction (CSP), 
databases, matrix operations, PGM inference, logic, coding theory, and complexity theory can be viewed 
as special instances of a generic problem we call the Functional Aggregate Query, or the FAQ problem, 
which we define next. The first two columns in Table 1 present eight of these problems. See [8,30,57] and 
Appendix A for many more examples. 
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1.2 The FAQ problem 

Throughout the paper, we use the following convention. Uppercase X} denotes a variable, and lowercase Xi 
denotes a value in the domain Dom(Xj) of the variable. Furthermore, for any subset SC [n], define 

X s = (Xi) ieS , x-s = ( Xi) ieS e n Dom(A'i). 

ies 

In particular, X 5 is a tuple of variables and X 5 is a tuple of specific values with support S. The input to FAQ 
is a set of functions and the output is a function computed using a series of aggregates over the variables and 
input functions. More specifically, for each i € [n], let X, be a variable on some discrete domain Dom(Xj), 
where |Dom(X;)| > 2. The FAQ problem is to compute the following function 

¥>(*[/]) = © </+1) ••• © (?l) <g> 0s(x S ), (1) 

ai/-)-iGDom(Xj!_|_i) s n GDom(X n ) 


where 

• H = (V ,£) is a multi-hypergraph. V = [n] is the index set of the variables Xi, i C [n]. Overloading 
notation, V is also referred to as the set of variables. 

• The set F = [/] is the set of free variables for some integer 0 < / < n. Variables in V — F are called 
bound variables. 

• D is a fixed domain, such as {true, false}, { 0 , 1 } or M + . 

• For every hyperedge S £ £, ips '■ Hies Dom(Xi) —> D is an input function (also called a factor). There 
are to = \£\ hyperedges. 

• For every bound variable i > /, ©W is a binary (aggregate) operator on the domain D. Different 
bound variables may have different aggregate operators. 

• Finally, for each bound variable i > f either ©W = © or (D,©W,< 8 >) forms a commutative semiring 
1 (with the same additive identity 0 and multiplicative identity 1). If ©W = ©, then ©M is called a 
product aggregate ; otherwise, it is a semiring aggregate. 

To avoid triviality, we assume that there is at least one semiring aggregate. (The semiring requirement is not 
as much of a restriction as one might think at first glance. In Appendix B, we describe several methods for 
‘turning’ non-semiring aggregates into semiring aggregates.) Because for i > f every variable Xi has its own 
aggregate ©W over all values Xi £ Dom(XQ, in the rest of the paper we will write to mean ©^ 

aiiGDom(Xi) 

We will often refer to ip as an FAQ -query. We use FAQ-SS 1 2 to denote the special case of FAQ when there 
is only one variable aggregate, i.e. ©W = ©, Vi > /, and (D, ©, <g>) is a semiring. The special case of FAQ-SS 
when there is no free variable is called the SumProd problem. As shall be further discussed in Section 3, 
Sum Prod and FAQ-SS are well-studied problems. 

1.3 Input and output representation 

To make the problem definition complete, we will also have to specify how the input and output functions 
of an FAQ instance are represented. As we shall see in Section 8 , this is a subtle issue that vastly affects the 
landscape of tractability of the problem. 

1 A triple (D, ©, ®) is a commutative semiring if 0 and 0 are commutative binary operators over D satisfying the following: 
(1) (D.ffi) is a commutative monoid with an additive identity, denoted by 0. (2) (D,(g>) is a commutative monoid with 

a multiplicative identity, denoted by 1. (In the usual semiring definition, we do not need the multiplicative monoid to be 
commutative.) (3) distributes over ®. (4) For any element e G D, we have e<g)0 = 0(S)e = 0. 

2 FAQ with a Single Semiring. 
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To streamline the presentation, in the first part of this paper we will assume that both the input and 
output factors are represented using the listing representation: each factor is a table of all tuples of the 
form (xg,(xg )}, such that , 0g(xg) y 0. (In particular, entries not in the table are 0-entries.) This 
representation is commonly used in the CSP, databases, and sparse matrix computation domains. 

Our algorithms are in fact more generic, they work for a general class of input and output representations, 
as discussed in Section 8. 

1.4 Paper organization 

Section 2 summarizes the contributions of the paper and sketches the line of attack. Related works are 
discussed in Section 3. Section 4 defines notations, terminologies, and establishes a few facts used throughout 
the paper. Section 5 discusses the main ideas behind InsideOut and analyzes its runtime given a variable 
ordering. Section 6 explains how to characterize variable orderings that are “semantically-equivalent” to 
the original ordering in the given FAQ-query. Section 7 explains how to efficiently search through all those 
equivalent variable orderings to find the “best” one (i.e. the one that allows InsideOut to run the fastest). 
Finally, Section 8 presents the effect of input and output representations; in particular, it shows how InsideOut 
is still useful for problems such as SAT and #SAT. 

2 Summary of contributions 

2.1 Conceptual contribution 

The formulation of FAQ has its roots in the SumProd and more generally FAQ-SS problems, which have been 
studied by by Dechter [30], Aji and McEliece [8] and Kohlas and Wilson [57]. The SumProd problem is 
exactly the special case of FAQ when all variable aggregates are semiring aggregates over the same semiring, 
and there is no free variable. We will discuss more of the history of this problem in Section 3. 

FAQ substantially generalizes SumProd, as FAQ can now capture problems in logic such as QCQ (quantified 
conjunctive query) or jfQCQ (sharp quantified conjunctive query). We argue that FAQ is a very powerful 
way of thinking about these problems and related issues. FAQ can be thought of as a declarative query 
language over functions. For example, we show in Section 8 how different input representations can vastly 
affect the landscape of tractability of the problem, and how the output representation is related to the notion 
of factorized databases [73]. 

2.2 Algorithmic contribution 

We present a single algorithm, called InsideOut, to solve FAQ. InsideOut is a variation of the variable 
elimination algorithm [30,93,94]. In PGM, variable elimination was first proposed by Zhang and Poole [93]. 
Then Dechter [30] observed that this strategy can be applied to problems on other semirings such as constraint 
satisfaction and SAT solving. In the database literature, Yannakakis’ algorithm [90] can also be cast as 
variable elimination under the set semiring or Boolean semiring. 3 

InsideOut adds three minor twists to the basic variable elimination strategy. First, we use a backtracking- 
search strategy called Outsideln to compute the intermediate results. This strategy allows us to use recent 
worst-case optimal join algorithms [2,67,69,89] to compute intermediate results within the fractional edge 
cover bound [11,46]. Second, we introduce the idea of an indicator projection of a function onto a given set 
of variables to obtain the fractional hypertree width style of runtime guarantee [47]. Third, in addition to 
making use of the distributive law to ‘fold’ common factors [8] when we face a semiring aggregate, we apply 
a swap between an aggregate and the inside product when that aggregate is also a product. 

We show that InsideOut runs in time 0(jV faqw ^ + ||<^||), where a is a variable ordering that we choose 
to run the algorithm on, N is the input size, and ||y?|| is the output size (under the ‘listing representation’ 
of input and output factors), and faqw(er) is a parameter called the (fractional) FAQ -width of a. FAQ-width 

3 It is well-known [8,58] that variable elimination and message passing are equivalent in the special case of FAQ-SS. 
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is the FAQ-analog of the induced fractional hypertree width of a variable ordering. (See Definition 5.10.) 
In this paper, we use O to hide a logarithmic factor in data complexity and a polynomial factor in query 
complexity. 

In fact, Section 8 shows that the variable elimination framework is still powerful in cases when the 
fractional hypertree width bounds are no longer applicable. These are special cases of FAQ where the input 
functions are compactly represented. In particular, we explain how with a suitable modification InsideOut 
can be used to recover recently known beyond worst-case results in join algorithms (Minesweeper [65], and 
Tetris [2]), and results on the tractability of SAT and #SAT for /3-acyclic formulas [18,74]. 

2.3 Main technical contributions 

2.3.1 “Width” of an FAQ 

In light of InsideOut running in time 0(N faqw( ' <7 ' > + ||y>||) for a given variable order er, the key technical problem 
is choosing a er that minimizes faqw(cr). This is where the generality of FAQ requires new techniques and 
results. Traditional variable elimination for CSPs or PGM inference also requires computing a good variable 
ordering to minimize the (induced) treewidth [58] or fractional hypertree width [47] of the variable ordering. 
However, in those cases all variable orderings are valid; hence, all we have to do in this traditional setting is 
to compute a tree decomposition whose maximum bag size (or maximum fractional edge cover number over 
the bags) is minimized; then, the GYO-elimination procedure will produce a good variable ordering (see 
Section 4). In the general setting of FAQ, just like in logic where there are alternating quantifiers, the set 
of semantically equivalent variable orderings depends on both the scoping structure specified by the input 
query expression and the connectivity structure of the query’s hypergraph. 

To see how the query’s hypergraph affects the set of equivalent variable orderings, consider the following 
simple example. A natural class of valid permutations to consider are those that only permute aggregates 
in a maximal block of identical aggregates in the query expression. However, taking the query hypergraph 
into account, one can do much better. Consider, for example, the FAQ-query 

ip = max y max • • • E V’{l,3,...,2fc-l}V ; {2,4,...,2fc} 

X2 x 2 k 

where both factors have range R+. In this case, even though max and Y do not commute with one another, 
we can rewrite ip using any of the (2fc)! variable orderings and still obtain the same result. (The aggregates 
have to be permuted along with the variables to which they are attached.) 

Even for the special case of FAQ-SS, where there is only one type of semiring aggregates hence all 
permutations are valid, computing the optimal variable ordering is already NP-hard in query complexity, 
because computing the (fractional hyper) treewidth of the query hypergraph is NP-hard. (See Gottlob 
et al [37, 42] for a recent survey and new results on this topic.) Hence, the extra complication of only 
considering ‘valid’ orderings for FAQ seems to make our task much harder. Somewhat surprisingly, we are 
able to show that the complexity of computing the optimal ordering for general FAQ is essentially the same 
as the complexity of computing the optimal ordering for FAQ-SS instances. Figure 1 presents a schematic 
summary of our main technical contributions, described in more details below. 

• Given an FAQ-query ip, we define the set EVO(</?) of all variable orderings semantically equivalent to ip. 
Roughly, for any er G EVO(<^), if we rewrite the expression for ip using the ordering er (with all aggregates 
permuted along), then we obtain a function identical to ip, no matter what the input factors are. The 
FAQ-width of <p is faqw(yj) = min 0 - eEV o( ¥ )) faqw(cr). (If the FAQ instance was a SumProd instance, then 
faqw(y)) is exactly the fractional hypertree width of the query.) 

In Figure 1, suppose we are looking at the top/orange path: from the input expression ip with variable 
order er, if we somehow were able to compute the optimal variable ordering a* = arg min reEV0 ( v ) faqw(r), 
then we would have an algorithm running in time 0(_/V faqw (^ + ||y>||). Here, ||y;|| denotes the output 
size. However, even if we were willing to spend an exponential time in query complexity, in order to 
take the orange path and find a* we need to be able to characterize the set EVO(y>). In particular, 
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EVO(yj) = set of expressions 
“semantically emiivalent” to p 



Runtime = 0(N faqw ^ a ) + 
= 0(N 0PT + ||cp||) 


FAQ -expr. a 
for ip, hypergraph 'LL 


poly(|'H|) 


FAQ -expr. a* 
for tp 


InsideOut 


P 


a* = argmin TgEV0(¥>) faqw(r) 

Runtime = 


LinEx(P) C EVO(^) 

EVO(vj) = CWE(LinEx(P)) 
iin TeLinEx (p) faqw(r) = min^Evo^) faqw(r) 


0 (iV OP T~*" 9 (OPT) + \\<p\\) 



Expression Tree 
Precedence Poset P 


Tree Decomposition of % 

Figure 1: Summary of our technical contributions 


FAQ -expr. a 
poly(|P|) ' f or P 

faqw(b) < OPT + ^(opt) 

g = approx, factor 
for fractional 
hypertree width of TL 


given a variable ordering r, how do we know whether t £ EVO(</?) or not? Our answer to this question 
comes next. 

• We describe how to (in poly-time, query complexity) construct an expression tree for the input FAQ- 
query. The expression tree induces a partially ordered set on the variables called the precedence poset. 
By defining a notion called component-wise equivalence (CWE), we completely characterize EVO(tp): 
a £ EVO(y?) if and only if it is component-wise equivalent to some linear extension of the precedence 
poset. In fact, we also show that checking whether a £ EVO(yj) can be done in polynomial time in 
query complexity. 

• While membership in EVO(y>) is verifiable in polynomial time, as aforementioned the optimization 
problem a* = argmin TgEV0 (^) faqw(r) is NP-hard. In some applications such as PGM-inference, we 
cannot sweep query complexity under the rug as we do in database applications. It is thus natural 
to design approximation algorithms for faqw(tp). To this end, we prove another technical result, that 
going through all orderings in EVO(y?) is not necessary. The set of linear extensions of the precedence 
poset is sufficient: every linear extension of the precedence poset is semantically equivalent to p, and 
every a £ EVO(ip) has the same FAQ-width as some linear extension of this poset. 4 

• Finally, using an approximation algorithm for the fractional hypertree width (fhtw) with approximation 
ratio g(fhtw) as a blackbox, 5 and using the expression tree as a guide, we give an approximation 
algorithm computing an ordering a such that faqw(cr) < opt + g{ opt), where OPT = faqw(<p). (In the 
FAQ-SS case, our approximation guarantee is slightly better: faqw(cr) < <?(opt).) 

4 If the instance is an FAQ-SS instance, then the poset imposes no order on the non-free variables, i.e. the set of linear 
extensions is the set of all possible variable orderings of non-free variables. 

5 The best known such algorithm due to Marx [60] has g(fhtw) = Offhtw 3 ). 
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2.3.2 The effect of input and output representation 

The input to FAQ is a collection of functions. We observe that the representation of the input functions has 
a huge effect on the computational complexity of the problem. 

We begin with the representation of the input factors. One option is the truth table representation for 
each factor involved in the FAQ instance ip (e.g. the conditional probability table in PGM [58] or the usual 
2D-array representation of matrices). This representation is wasteful when the input factors have many 
O-entries. Another option, which is commonly used in the CSP, databases, and sparse matrix computation 
domains is to list pairs (xs, ips(*-s)) for a given factor tps such that ips{^-s) ^ 0. The results listed in Table 1 
implicitly assumed the listing representation. 

However, our results can handle even more succinct representations such as GDNFs and decision di¬ 
agrams [25] in CSP literature and algebraic decision diagrams in the PGM literature [12]. We present a 
common view of these representations: an input factor ipsi'X-s) might itself be the output of some other 
FAQ instance p' s on the (free) variables xg. The succinctness in the representation comes from representing 
factors of <p' s in the listing format (instead of listing ips)- Technically, we analyze what happens when one 
composes an FAQ instance with another. More interestingly, this framework is general enough to present a 
class of structured matrices (including the DFT matrices) for which we can quantify how much better our 
algorithm runs than the naive quadratic time algorithm. 

Furthermore, we observe that when the input factors are too “compact”, then the class of tractable FAQ- 
queries is much smaller, and the fractional hypertree width framework no longer applies. Nevertheless, the 
variable elimination strategy is still a powerful approach. We will, however, have to change the algorithm 
used to solve sub-problems: it can no longer be backtracking-search-style of algorithm. We give four examples 
where we explain how recent beyond worst-case results in join algorithms (Minesweeper [65], and Tetris [2]), 
and the tractability of SAT and #SAT for /3-acyclic formulas [18,74], all of which are special cases of FAQ, 
can be explained using this idea. 

Last but not least, we make several observations regarding the representations of the output function ip. 
For the case when the FAQ instance tp has no free variables, the algorithm needs to output a single element 
from D. However, when ip has free variables, then we have to also represent the output somehow. The 
default option is to also use the listing representation for the output. However, InsideOut is general enough 
to be able to output an FAQ instance as the output. This issue is slightly subtle as ip is already an FAQ 
instance that represents the output but it is not an “interesting” representation. However, the generality of 
our algorithm allows the input factors and output both to be represented as FAQ instances. Our observations 
here are very close in spirit to the recent results on database factorization of Olteanu and Zavodny [73]. 

2.4 Highlights of our corollaries 

In light of the fact that many problems can be reduced to FAQ, Table 1 presents a selected subset of corollaries 
that our results imply. 6 We list the results assuming the optimal variable ordering is already given. (This 
holds true for both known results and our results.) When the optimal variable ordering is not given, the 
exponents of N in all cases have to be changed to the best known approximating factors for the corresponding 
width. In the FAQ case, that would be 0(faqw 3 (y>)). 

For each problem, the table lists the corresponding FAQ instance, the runtime of the previously best 
known algorithm, and the runtime of InsideOut. These are the problems that we would like to highlight, as 
they yield either new results or an alternative interpretation of known results in the FAQ framework. 

The results in Table 1 roughly span three areas: (1) CSPs and Logic; (2) PGMs and (3) Matrix operations. 
Except for joins, problems in area (1) need the full generality of our FAQ formulation, where InsideOut either 
improves upon existing results or yields new results. Problems in area (2) can already be reduced to FAQ-SS. 
Here, InsideOut improves upon known results since it takes advantage of Grohe and Marx’s more recent frac¬ 
tional hypertree width bounds. Finally, problems in area (3) of Table 1 are classic. InsideOut does not yield 
anything new here, but it is intriguing to be able to explain the textbook dynamic programming algorithm 

6 The results listed in the table implicitly assumed the listing representation of input factors. 
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Problem 

FAQ formulation 

Previous Algo. 

Our Algo. 

#QCQ 

See 

where € {max, x} 

No non-trivial algo 

0(N f3 qwC ' p ) + ||^||) 

QCQ 

see 

where € {max, x} 

0(A pw TP + \\<p\\) [24] 

0(j V f 3qw(sp) + ii^n) 

#CQ 

max • • ■ max |[ ?/>s(xs) 

(*!,...,*/) see 

q( N dm(h) + ii^id [ 34 ] 

0{N fa qwC ' P ) + ||^||) 

Joins 

U x rise£ V’s(xs) 

6 (n^W + ||<p||) [46] 

6 + ii^i^ 

Marginal 

E n <Ps(xs) 

see 

d(N htw(^) + 11^11) [54] 

0(N faq w(^) + ||<p||) 

MAP 

max TT Vfs(xs) 

(X f 11 ,...,x n ) A 

v t+1 ' ’ ' see 

0(A h tw(¥>) + 11^11) [54] 

0 (A faqw (^) + II^H) 

MCM 

n 

^ ^ 1 (*^i j *£«+ 1 ) 

DP bound [28] 

DP bound 

DFT 

E by n 6 ^ 

0 <j + k<m 

0(N\o Sp N) [27] 

0(N log p N) 


Table 1: Runtimes of algorithms assuming optimal variable ordering is given. Problems shaded red are 
in CSPs and logic (D = {0,1} for CSP and D = N for #CSP), problems shaded green fall under PGMs 
(D = R+), and problems shaded blue fall under matrix operations (D = C). N denotes the size of the 
largest factor (assuming they are represented with the listing format). htw((/?) is the notion of integral 
cover width defined in [54] for PGM. P\N(H) is the optimal width of a prefix graph of hi from [24] and 
DM(%) = poly(P 1 -ss('H),fhtw(H)), where F-ss('H) is the [/]-quantified star size [34]. ||</j|| is the output size 
in listing representation. Our width faqw(y>) is never worse than any of the three and there are classes of 
queries where ours is unboundedly better than all three. In DFT, N = p m is the length of the input vector. 
O hides a logarithmic factor in data complexity and polynomial factor in query complexity. 


for Matrix-Chain Multiplication [28] as an algorithm to find a good variable ordering for the corresponding 
FAQ-instance. The DFT result is a re-writing of Aji and McEliece’s observation [8]. 7 

It should be noted that the prior results on #CQ [34] and QCQ [24] focused on dichotomy theorems for 
bounded-arity classes of input hypergraphs, not just on the best possible runtime one can get. Our faqw 
notion is a generalization of fractional hypertree width, which steps into the unbounded arity world. See 
Marx [62] for a more detailed discussion of known results on CSP in the unbounded-arity case. 

3 Related work 

Since FAQ encompasses so many areas, our related work discussion is necessarily incomplete. Appendix G 
discusses more related works on the factor/function representation issue. 

3.1 Problems on one semiring 

As was mentioned earlier, Sum Prod is the special case of FAQ when all variable aggregates are semiring 
aggregate over the same semiring, and there is no free variable. This special case is powerful enough to capture 


7 Note that we have further results on matrix vector multiplication for structured matrices (see Appendix G.3). 










a bunch of problems (e.g. it captures all CSPs). This problem was implicitly defined by by Dechter [30], 
who solved it using variable elimination. 

To the best of our knowledge, the FAQ-SS problem was explicitly defined by Aji and McEliece [8] who 
called it the MPF problem (for Marginalize the product function). 8 They presented a message passing algo¬ 
rithm for FAQ-SS and essentially showed that their algorithm meets the treewidth bound. Their paper also 
lists a number of problems that are FAQ-SS instances, including Matrix Chain Multiplication (less specific than 
our result, they just argue that essentially different variable orderings give rise to different ways of paren¬ 
thesizing the matrix chain multiplication) and Matrix Vector Multiplication. They showed that their general 
algorithm contains FFT as a special case. We re-phrase their interpretation of the FFT using InsideOut. They 
also showed that many basic decoding problems in coding theory can be cast as FAQ-SS instances. 

Kohlas and Wilson [57] presented even more applications of the FAQ-SS problem. The paper categorized 
various existing message passing algorithms depending on what extra properties they need beyond (D, ®, <g>) 
being a commutative semiring. Their paper also explored algorithms for approximate computations (while in 
this work we solely deal with exact computation). Approximate computations in PGMs have been explored 
under the semiring framework [80]. 

Most of the results in the PGM literature present algorithms that are shown to obtain the treewidth bound. 
To the best of our knowledge, the finest hypergraph width parameter used to bound the performance of PGM 
inference algorithms is the integral hypertree width bound of [43], which appeared in [31,54]. See Sections 3.3 
and 4.3 for a more detailed discussion on the various width parameters. 

In the database literature, recently Koch [56] described an algebraic query language called AGCA over 
‘rings of databases’ which is somewhat similar in spirit to FAQ. This framework makes use of additive 
inverses to allow for efficient view maintenance. 

3.2 Factorized databases 

Bakibayev et al. [13] and Olteanu and Zavodny [73] introduced the notion of factorized databases , and 
showed how one can efficiently compute join and aggregates over factorized databases. In hindsight there 
is much in common between their approach and InsideOut applied to the single semiring case of FAQ-SS. 
Both approaches have the same runtime complexity, because both are dynamic programming algorithms, 
InsideOut is bottom-up, and factorized database computation is top-down (memoized). 

The FAQ framework is more general in that it can handle multiple aggregate types. Our contribution 
also involves the characterization of EVO and an approximation algorithm for faqw. On the other hand, 
aspects of factorized database that FAQ does not handle include the evaluation of SQL queries and output 
size bounds on the factorized representations. 

3.3 Width parameters 

Various notions of hypergraph ‘widths’ have been developed over the years in PGM, CSP, and database 
theory. In particular, two often-used properties of the input query are acyclicity and bounded width. 

When the query is acyclic, the classic algorithm of Yannakakis [90] for relational joins (and CSPs) runs 
in time linear in the input plus output size, modulo a log factor. Similarly, Pearl’s belief propagation 
algorithm [77] works well for acyclic graphical models. As we briefly touch upon in Appendix F.l, Yannakakis’ 
algorithm is essentially belief propagation on the Boolean semiring or set semiring. The algorithm can also 
be reinterpreted using InsideOut. 

Subsequent works on databases and CSPs further expand the classes of queries that can be evaluated in 
polynomial time. These works define progressively more general width parameters for a query, which intu¬ 
itively measure how far a query is from being acyclic. Roughly, these results state that if the corresponding 
width parameter is bounded by a constant, then the query is ‘tractable,’ i.e. there is a polynomial-time 
algorithm to evaluate it. For example, Gyssens et al. [50, 51] showed that queries with bounded degree of 
acyclicity are tractable. Then came query width (qw) from Chekuri and Rajaraman [22], hypertree width and 

8 Though related problems had been defined before: see e.g. [14]. 
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generalized hypertree width (ghw) from Gottlob et al. [43,82], and fractional hypertree width from Grolie and 
Marx [47]. See [42] for a survey and [37] for the latest in this line of work. Marx developed stronger width 
parameters called adaptive width and submodular width [61,62], which were recently extended to functional 
dependencies and degree bounds [3,4] . 

In the PGM literature, the most common parameter is treewidth as the textbook variable elimination 
and message passing algorithms are often stated to run in time 0(N W+1 ) where w is the tree width of the 
model [58]. Freuder [38] and Dechter and Pearl [32] showed in late 1980s that CSP instances with bounded 
treewidth are tractable. 

In the logic/finite model theory literature, several width parameters were also developed [6,24,34]. We 
will describe them later in the relevant sections of the paper. 

3.4 Finite model theory 

In [79], Pichler and Skritek studied the #CQ problem in the special case where the query is acyclic. We 
refer to this special case as ffACQ. In particular, they showed that #ACQ is tractable in data complexity 
(i.e. when the number of variables that we are counting over is a constant) and in query complexity (i.e. 
when all relations have constant sizes) but not in combined complexity where the problem turns out to be 
#P-complete. 

In [34], Durand and Mengel introduced a new parameter for ffCQ called the quantified star size. It is 
basically a measure of how free variables are connected in the query’s hypergraph. Along with bounded 
generalized hypertree width (or fractional hypertree width), bounded quantified star size characterizes the 
classes of #CQ instances that are tractable in the bounded arity (or bounded generalized hypertree width) 
case. 

The quantified star size idea has been expanded later by applying it to the core of the query instead of 
the original query [45] . The core is a minimal subquery that is homomorphic to the original query. Because 
homomorphism does not preserve counts (i.e. it is not a one-to-one mapping), free variables have to be 
explicitly preserved by taking the core of the color query of the original query. Further development along 
with new lower bounds appeared in [26]. 

QCQ (second row of Table 1) has its own long line of research. An early and interesting result was 
the tractability of QCQ when the domain size, treewidth, and number of quantifier alternations are all 
constants [23]. More recently, Chen and Dalmau introduced a width parameter for QCQ based on elimination 
orderings [24]. In particular, they take the minimum width over some variable orderings that are equivalent 
to the original query. They showed the tractability of QCQ when their width is bounded. 

The runtime of InsideOut for CQ, #CQ, and QCQ are unboundedly better than the above results as shown 
in Table 1. Our result on ffQCQ is new: to the best of our knowledge no non-trivial efficient algorithms 
for ffQCQ were known prior to our work. Essentially, InsideOut is able to unify the above results under the 
same umbrella. 


4 Preliminaries 

4.1 Factors, their representations, and sizes 

In relational database systems [1,59,88], constraint satisfaction, and sparse matrix operations [92], the 
following representation of input and output factors is the most common: 

Definition 4.1 (Listing representation). In the listing representation , each factor is a table of all tuples of 
the form (xg, V\s( x s)), such that t/ts( x s) ^ 0. In particular, entries not in the table are 0-entries. 

We will assume in most of this paper that all input and output factors are represented using the listing 
representation. Section 8 discusses how our results still hold under other representations and the effect they 
have on the computational landscape. 


10 


Recall that 0 is the additive identity of the semiring(s) which also annihilates any element of D under 
multiplication. 

Let W C [n] be some subset of variables, and y w £ I! iew Dom(Xj) be some given value tuple. The 
conditional factor ips(- I y w) is a function from Iljgs Dom(Xj) to D defined by 


f 0 if S' PI IT 7 ^ 0 and xsnw ^ ysnw 

Ts(xs I y w) = \ . , , ,, 

I ipsyX-s) otherwise. 

For each factor ips, define its size to be the number of non-zero points under its domain: 


IIV’sll := |{x S | ^s(xs) ^ 0} | . 

(This is also the number of rows in the table representing ips in the list representation.) Let T C S' be 
arbitrary, then obviously we can write a factor size as a sum of conditional factor sizes: 


IIV’sll = E IIV’sO I yr)||- 

yreLligT Dom(Xj) 


( 2 ) 


Throughout this paper, we use N to denote the maximum over all input factor sizes. 

Definition 4.2 (Indicator projection). For any two sets T, S C [n] such that S (~l T ^ 0, and a given factor 
ips, the function ips/T '■ ILeT Dom(X,) —>■ D defined by 


S/t(xt) 


1 3x5_t s.t. ^5 (x T ,Xs_t) ± 0 

0 otherwise 


is called the indicator projection of ips onto T. 


4.2 AGM-bound and fractional cover numbers 

Let H = ( V ,£) be a hypergraph. Let B C V be any subset of vertices. An integral edge cover of B using 
edges in "H is a feasible solution A = (As)se£ to the following integer program: 

min E A s 

SGS 

s.t. ^ As > 1, Vu e B 

S:veS 

As e {0,1}, VSeS, 

whose optimal objective value is denoted by pu(B). The number pu{B) is called the integral edge cover 
number of B. Similarly, p^(B) is the optimal objective value of the relaxation 

min E As 

se£ 

s.t. ^ As > 1, Vu e B 

S:v£S 

As >0, VS G e. 

Any feasible solution to the above linear program is called a fractional edge cover of B using edges in T~L. 
Note that p and p* are functions from 2 V —>• R + . 

Fix some input ( ips)s&£ for our FAQ problem. Let A* = (As)se£ denote an optimal solution to the linear 
program 

min E As log 2 \ips\ 

Sg£ 
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s.t. ^ Xs > 1, Vv G B 

S:veS 

X s >0, VS G £. 

Then, the quantity 

AGM h {B) := n Ws\ X ' s (3) 

se£ 

is called the AGM-Sound for 3? using edges of 33. It is obvious that 

AGM h {B)<N p ^ b \ (4) 

We call the quantities P-h(V), p^(V), and AGM^(V) the integral cover number, fractional cover number, 
and AGM-bound corresponding to 33. When 33 is implicit from context, we shall drop the subscript 33 and 
write p(B),p*(B), and AGM(33) for the sake of brevity. Note that the AGM-bound is “data-dependent” 
in the sense that it is a function of the input factors, while the cover numbers are only dependent on the 
hypergraph. Thus, our notation AGM (B) is under the implicit assumption that the input factors are fixed. 

4.3 Tree decomposition, acyclicity, and width-parameters 

Definition 4.3 (Tree decomposition). Let 33 = (V,£) be a hypergraph. A tree-decomposition of 33 is a pair 
(T, x) where T = (V(T),E(T)) is a tree and x '■ F(T) —> 2 V assigns to each node of the tree T a subset 
of vertices of TL. The sets y(t), t G V(T), are called the bags of the tree-decomposition. There are two 
properties the bags must satisfy 

(a) For any hyperedge Fg£, there is a bag y(t), t G V(T), such that F C y(t). 

(b) For any vertex uGV, the set {t \ t G V(T),v G x(t)} is not empty and forms a connected subtree of 

T. 

Definition 4.4 (a-acyclic). A hypergraph 33 = (V, £) is a-acyclic iff there exists a tree decomposition (T, x) 
in which every bag y(t) is a hyperedge of 33. 

When T-L represents a join query, the tree T in the above definition is also called the join tree of the query. 
A query is acyclic if and only if its hypergraph is acyclic. While possessing many nice properties [36], the 
notion of a-acyclicity is unsatisfying because we can turn any hypergraph into an a-acyclic hypergraph by 
adding a hyperedge covering all its vertices. This observation motivates a second notion of acyclicity [36]. 

Definition 4.5 (/3-acyclicity). A hypergraph 33 is /3-acyclic iff the hypergraph formed by any subset of 
edges of 33 is a-acyclic. 

To define commonly used width parameters of hypergraphs, we follow the width function framework 
introduced by Adler [5]. Let H = (V,£) be a hypergraph. Let g : 2 V —>• R + be a function that as¬ 
signs a non-negative real number to each subset of V. Then, the g-width of a tree decomposition (T,x) is 
maxigy(j’) g(x(t))- The g-width ofH is the minimum (/-width over all tree decompositions of T-L. Note that 
the (/-width of a hypergraph is a Minimax function. 

Definition 4.6 (Common width parameters). Let s be the following function: s(B) = \B\ — 1, VF C V. 
Then the tree-width of a hypergraph denoted by tw(3t), is exactly its s-width. The hypertree width of a 
hypergraph "H, denoted by htw(4f), is the p-width of 33, and the fractional hypertree width of a hypergraph 
33, denoted by fhtw(33), is the p*-width of 33. 
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4.4 Vertex/variable ordering and its equivalence to tree decomposition 

Besides tree decompositions, there is another equivalent way to characterize and define ( a//3 -) acyclicity 
and width parameters of hypergraphs using a listing of vertices of a hypergraph [15,16]. The results in this 
section are probably well-known to researchers in this area, but we were not able to track down a precise 
reference for some of our propositions below. Their proofs are presented in Appendix C for completeness. 

Definition 4.7 (Vertex/variable ordering). A vertex ordering of a hypergraph H = (V, £) is simply a listing 
cf = v\,... ,v n of all vertices in V. Because we use vertices (of H) and variables more or less interchangeably 
in this paper, the term variable ordering will also be used with the same semantic. 

In the literature “elimination order,” “elimination ordering,” or “global attribute order” are also used in 
place of “vertex ordering” [2,15,16,65]. However, we chose not to use “elimination order” here because a 
vertex ordering is meant to be the reverse of a (GYO) elimination ordering, as we explain below. 

Elimination hypergraph sequence. Fix a vertex ordering a = tq,..., v n of 'LL, for j = n, n— 1,..., 1 we 
recursively define a sequence of n hypergraphs ■ ■ ■ i^i as follows. To avoid cumbersome super¬ 
scripting, we will denote the sequence as ..., T~Li when the vertex ordering a is clear from context. 

Define 3i n = ( V n ,£ n ) = (V,£) = Ti. Let d(v n ) be the set of hyperedges of H n incident to v n , and U n be the 
union of edges in d(v n ): 

d{v n ) = {S&£ n \v n € S}, 

u n = U S.. 

S&d(v n) 

For each j = n — l,n — 2 ,..., 1, define the liypergraph Hj = {Vj,£j) as follows. 

Vj = {vi,...,Vj} 

= (£j+i - d( v j+i)) U {Uj+i - {uj+i}} 
d( Vj ) = {56 £j I Vj G S} 

Uj = [j S. 

SEd(vj) 

Again, strictly speaking the sets Vj, £j , d(vj ), and Uj should have been denoted by VJ ,£J ,d a (vj), and UJ. 
But we drop the superscript as a is implicitly understood. 

Definition 4.8. The above sequence of hypergraphs is called the elimination hypergraph sequence associated 
with the vertex ordering er. There is an intimate relationship between tree decompositions and vertex 
orderings, which can be proved by making use of the above elimination hypergraph sequence. 

Proposition 4.9 (a- Acyclicity). A hypergraph T~L is a-acyclic if and only if there is a vertex ordering 
a = (iq,..., v n ) such that UJ G d(vk) for all k G [n]. 

Proposition 4.10 (/3-Acyclicity). A hypergraph T-L is fd-acyclic if and only if there is a vertex ordering 
<r = (iq,... ,v n ) such that the collection of hyperedges in d(i’k ) form a nested inclusion chain, for all k G [n]. 
Furthermore, /3-acyclicity can be verified in polynomial-time. 

Definition 4.11 (Induced (/-width). Let H = (V ,£) be a hypergraph. Let g : 2 V —> R + be a function and 
a = (vi ,..., v n ) be a vertex ordering of H. Then, the induced g-width of a is the quantity max^u g(UJ). 
When g{B) = |B| — 1, this is called the induced width of a. When g(B) = pu{B), this is called the induced 
integral edge cover width of a. When g(B) = p^(B), this is called the induced fractional edge cover width of 

a. 

We next characterize three width parameters of a hypergraph using vertex ordering. A function g : 2 V —> 
K + is said to be monotone if g(A) < g{B) whenever A C B. We prove a generic lemma. 


(5) 

( 6 ) 
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Lemma 4.12 (g-width). Let g : 2 V —> R + be a monotone function. A hypergraph T~L = (V,£) has g-widtth 
at most w if and only if there exists a vertex ordering a of all vertices ofH such that the induced g-width of 
a is at most w. 

Because the functions s(£4) = |C4| — 1, Pu(Uk) and p^(t4) are all monotone, the following results follow 
immediately. 

Corollary 4.13. A hypergraph T~L = (V, £) has treewidth (respectively, generalized hypertree width, fractional 
hypertree width) at most w if and only if there exists a vertex ordering cr = (v\,... ,v n ) of H such that for 
every k € [n] we have \U%\ < w + 1 (respectively, pn(Uk) < w, p^i(Uk) <w). 

Note that the above corollary is actually three corollaries. The one regarding tree-width alone is well- 
known in the probabilistic graphical model literature [10,32], The other two are probably folklore, but we 
were not able to find them explicitly stated anywhere. 


5 The InsideOut Algorithm 

5.1 Algorithmic Warm-ups 

We first present a simple solution to the FAQ-SS problem. Recall FAQ-SS denotes the special case of FAQ 
when all variable aggregates are the same: i.e. ®W = ®,V* > /, and (D,©, <g>) is a semiring. Our aim is 
to gently introduce the reader to the duality of backtracking-search and dynamic programming, and to the 
main idea of variable elimination. 

5.1.1 Backtracking search 

Consider the SumProd form of the FAQ expression (1) when there is no free variable and all aggregates are 
the same semiring aggregate. In this case, we write the expression as 


*- = 0® ^s(xs)= 0 0® Vts(xs | 34) 

x afiGDom(Ai) \ x [n]-{i} S€£ 

We can evaluate this expression by going through each value of x± and computing the inner expression 
‘conditioned’ on this X\. The naive implementation of this strategy wastes time if there is any x\ for which 
some conditional factor ifs(' I 24 ) is identically 0. Thus, the obvious idea is to first compute the set I± of 
values 24 for which i(s{' I 24 ) ^ 0 for all factors ips- Then, recursively compute the expression 



<P= © 

xi ei* 


0 (g) <A(x.s I 24 ) 


t [n]—{1} ses 


Given that the input factors are represented using the listing format (i.e. only non-0 entries are listed), 
computing the above expression recursively is a join algorithm in disguise, and any of the algorithms from 
[2,67,69,89] works. We call this the Outsideln algorithm, as it evaluates the expression from the outer-most 
aggregate to the inner-most. It will serve as the algorithmic building block of InsideOut. In fact, the Outsideln 
algorithm works even if there were free variables. The following is almost immediate (see Appendix D for 
more details). 

Theorem 5.1. Let ip be an FAQ-SS -query whose hypergraph Li = (V,£) has m edges and n vertices. Algo¬ 
rithm Outsideln computes ip in time 0(mn ■ AGM(V) • log N). 

Outsideln is backtracking search [29,41] which was known 50 years ago in the AI and constraint program¬ 
ming world. In the PGM literature, the method of conditioning search is similar, but the main theoretical 
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objective is so that the conditioning graph is acyclic [78]. The main advantage of backtracking search is that 
it requires very little extra space. The main disadvantage is that it might have to resolve the same sub¬ 
problem multiple times. The duality between backtracking search and dynamic programming is well-known 
in the constraint programming literature [81]. The next section explores the other side of this duality: the 
dynamic programming side. 

5.1.2 Dynamic programming with variable elimination 

To solve the FAQ-SS problem using variable elimination [30,93,94], the idea is to “fold” common factors, 
exploiting the distributive law: 

<p(x F ) = ©- ©<g> ■0s(xs) 

Xf + 1 Xn Se£ 

= ®"'® ® Vtsfxs)®!® ® Vts( x s)J, 

Xf+1 Xn-1 Se£-d(n) \ X n S£d(n) J 

-v- 

new factor lb J, , 

where the equality follows from the fact that © distributes over ffi, and recall from (5) and (6) that d(n) 
denotes all edges incident to n in "H and U n = U sed(n)&• Note that the problem of computing the intermediate 
factor ijj'jj is exactly an FAQ-SS instance, where there is only one bound variable X n , and \U n \ — 1 free 
variables. Assume for the moment that we can somehow efficiently compute i/j'jj _r n \- 

After computing _{n}, th e resulting problem is another instance of FAQ-SS on a modified multi¬ 
hypergraph Hn-i, constructed from H by removing vertex n along with all edges in d(n), and adding 
back a new hyperedge U n — {n}. Recursively, we continue this process until all variables X n ,..., Xf+i are 
eliminated. Textbook treewidth-based results for PGM inference are obtained this way [58]. In the database 
context (i.e. given an FAQ-query over the Boolean semiring), the intermediate result ip' Un _s n i is essentially 
an intermediate materialized relation of a query plan. 

5.2 The InsideOut Algorithm 

5.2.1 Introducing the indicator projections 

While correct, basic variable elimination as described in Section 5.1.2 is potentially not very efficient for 
sparse input factors, i.e. factors whose sizes are smaller than the product of the domain sizes. The main 
reason is that the product that was factored out (i.e. © ) se£-a(n)' i /’s( x s)) might annihilate many entries of 
the intermediate result ip'jj _ri, while we have spent so much time computing ip’jj _r For example, for 
an S £ d(n) and tuple y s such that S C U n and ips(ys) = 0, we do not need to compute the entries 
ip'u _{ ra }( x £/ n _{ri}) f° r which yg = x F : those entries will be killed later anyhow. The idea is then to only 
compute those ^ Frl _{ n }( x [/„-{n}) values that will ‘survive’ the other factors. One simple way to achieve 
this is to allow for the factors that were factored out of the scope of X n to still participate in computing 



( \ 


/ 



V , C/„-{n}( x C/„-{n}) = ® 

<S> V’s(xs) 

© 

<E> 

V’S/E/„( x Sn(7„) 


X n 

\SeO(n) J 


smn), 

\snUn^® 

indicator projection j 



For a set S £ £ n — d{n) with S D U n ^ 0, the participation of a factor ips in computing ipu n —i n \ is only 
to “confirm” that entries computed are not wasteful; thus, only their indicator projections participate and 
not the real factors i/js themselves. In database terms, one can also think of the definition of ^ in (7) 
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as a simultaneous semijoin reduction of the main product IIse 9 (n) V’S with all of the “tables” ips for which 
SnUn^Q). 

The problem defined in (7) is an FAQ-SS instance with \U n \ — 1 free variables, which we can solve using 
the Outsideln algorithm as described in Section 5.1.1. Note the important point that, algorithmically we do 
not perform the “semijoins” individually; rather, we compute the multiway-join using a worst-case optimal 
join algorithm. 

5.2.2 The general case 

We finally describe how InsideOut deals with a general FAQ-query defined by the expression (1). Recall that 
each operator ©W for f < i < n either forms a semiring with (g> or is (g> itself. Let us see how the last 
variable can be eliminated in this general scenario. The elimination depends on two cases. 

Case 1: (D,©( n ),®) forms a semiring. In this case we apply the same strategy as before: we compute the 
intermediate factor ipu n -{n\ defined in (7) using Outsideln. 

Case 2: ffif") = ®. In this case, we rewrite the expression as follows. 


v>( x [/]) 




/T\(/+ 1 ) . 
VL7a; /+ i 

■■® { £!®Se£Mxs) 



£T\(/+ 1 ) . 
® x f+i 




/T\(/+ 1 ) . 

••©/I , l: (® Xn ®sesMxs)) 



/T\(/+ 1 ) . 

••©ir 1 ) < 8 >s 0 a (n )[V>s( x s)] |Dom(x " 

■ }| (©*, 

, ®s&d(n) i’s{*s)) 

/T\(/+ 1 ) . 
^ x f+! 

••©ir 1 '©^.^^)] 100 ^ 

‘ ''/©sea©) © Jn V’s( x s)) 


ip' s 


^S-{ n> 


Thus, in Case 2, the new FAQ instance has as its input the factors ip' s '= 1 /^ Dom W r >) f or g ^ d(n) 1 and the 
factors V’s-{ra}( x S'-{™}) *=* <8>x„V’s( x s)- We discuss how to compute these new factors in turn. 

First, consider S d{n). When “passing through” a product aggregate, the factors ips for S ^ d(n) 
are powered up, point-wise, by a power of |Dom(X n )|. By repeated squaring, the number of multiplications 
needed is within 

2 ’ Ills’ll • riog 2 |Dom(JQ|l. 

S&£„-d(n) 

So these factors are generally changed to new factors with the same size. 

There is one case in which we do not have to power them up: when ipsfas) is an idempotent element under 
the product aggregate <g>. In particular, if we knew that ips{x-s) £ {0,1} for all xs, then 'ips( x s)^ Dom ^ Xn ^ = 
V’S’(xs) and we can factor out ips as in the semiring case. This is indeed the case for the instances of FAQ 
that were reduced from QCQ and #QCQ as shown in Example A.20 and Example 1.3. Motivated by those 
two examples, we define the following concept. 

Definition 5.2 (Idempotent product aggregate). An aggregate is called an idempotent product aggregate 
with respect to the input variable ordering if it is a product aggregate in which for all S £ £k \ d(k), the 
(intermediate) factor ips has as its range the idempotent elements of the operator ®. In particular, if 
ips(x.s) £ {0,1} for all S € £k \ d(k), then © ^ is an idempotent product aggregate whenever ©^ = <g>. 

Note again that for the FAQ instances which are constructed from the reductions from QCQ and #QCQ, 
all product aggregates are idempotent. In such a case, we can rewrite (8) as 

<F( x [/]) = ■ ■ ■ ©i"” 1 ’ ©S£9(n) V’s( x s)(©s e9(n) <8> Xn 1>s(xs))- 

S. ✓ 

^S-ln} 
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Second, consider S ^ d(n). The factor can be thought of as a “product marginalization” of tfts, 

where we “marginalize out” the X n variable. Since we can compute these factors individually, we do not 
have to solve the costly intermediate FAQ-SS instance tp'jj _ r, as was done in Case 1. 


5.2.3 Output representation 

Every time InsideOut eliminates a variable, it obtains a new FAQ instance with one less variable. Let 
H n = T~L, T~L n - 1 , ■ • ■ denote the multi-hypergraphs of the resulting instances; i.e., Hk is the liypergraph of the 
FAQ instance just before we eliminate the fcth variable. We shall define the hypergraph sequence formally 
below. Before then, we describe how the output is computed. 

Let T~Lf = (V/ = [/],£/) denote the hypergraph of the FAQ instance resulting from eliminating all bound 
variables. With some abuse of notation, let ^§,5 £ £f denote the set of input factors to the TLf instance. 
The output to FAQ is now the expression 


¥>( x [/]) = ®se£f' l l J s{xs), (9) 

which is an FAQ instance with all free variables. We can compute the output directly by running Outsideln 
on (9). However, inspired by Olteanu and Zavodny [73] we can first compute the output in the factorized 
representation, and then report it. 

Intuitively, the idea is that we can first compute the set of tuples X[yj for which yj(x[yj) ^ 0, and then 
for each such tuple compute its value <p(x^j) using (9) by simply multiplying together the corresponding 
^S-values. Computing the non-0 output tuples is a join problem, which in the FAQ setting is the (U,n), 
(V, A), or ((63), Cg>) semiring under D = {0,1} where (63) is the Ol-OR operator defined below. 

Definition 5.3 (Ol-OR). Define the 01 -OR operator, denoted by (63), as follows. Given a, b £ {0,1}, 


a(63)& 


0 if a = b = 0 
1 otherwise. 


Clearly ((63), ®) is a semiring under the domain {0,1}. We use this semiring to realize our intuition above 
as follows. We continue running InsideOut to compute the following constant ?, whose value belongs to {0,1}. 

vO =@> 0 V’s/sOfs)- 
x m s&£ f 

As before, when eliminating variables Xf,Xf _we will obtain intermediate factors ipu k _{k} for 
k = /, / — 1,..., 1. However, in this case we keep around a tiny bit more book-keeping information; for 
k = f, f — 1,..., 1, we compute the following two intermediate factors 


I?u k ( X C4) 

= <S> ^s/Uki^snUk) 

Se£ k 

snu k ^Hs 

(10) 

4>u k -{k}(xu k -{k}) 

= @>V’C/fc( x C/J- 

(11) 


Xk 


Computing ijju k ( x c/ fc ), defined in (10), is no extra work for the Outsideln algorithm, which computes all points 
X[/ fc for which i^Uki^Uk) 7^ 0 anyhow. We just need to keep it around more explicitly in this case. Note also 
that U\ — {1} = 0, and ip Ul _^iy = Tp. 

Finally, we run Outsideln on the following expression instead of (9) 


^( x [/]) 


9 A nullary function is a constant. 



( 12 ) 
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The fact that (9) defines exactly the same function as (12) follows from the following trivial observation: for 
every fixed tuple X[/], ips^X-s) ^ 0 for all S G £f implies V’ 5 /s( x s) = 1 for all S G £/, which in turns implies 
i>u k (xu k ) = 1, for all fc G [/]. 

Eliminating free variables and then recovering them back by (12) are equivalent to the two phases of 
Yannakakis algorithm [90]. Our implementation of InsideOut in LogicBlox follows the elimination/recovery 
method. The reason Outsideln is faster on (12) than on (9) is that the factors ipu k help filter out potential 
tuples X[j] for which y>(x^]) = 0. We will see in the analysis below the effect of this idea in the overall 
runtime. One nice consequence of this strategy is that, in essence, “freeness” (of variables) is also a semiring 
aggregate. The entire algorithm is sketched out in Algorithm 1. 


Algorithm 1 InsideOut for FAQ 

Require: Hypergraph H = (V, 5), factors ips, S G £ , set F = [/] of free variables 
Require: FAQ query ip in the form (1) 


1 : 

2 : 

3: 

4: 

5: 

6 : 

7: 


9: 


10 : 


£n t— £ 

for (k <r- n downto 1) do 

d{k) <- {S | S G £ k and k G S} 

if k > f and (D, <g>) is a semiring or k < / then 

U k <r- |J 5 


SGd(k) 

if k > / then 


i>u h -{k}( Xu {k} ) - © 


(fc) 

Xk 


else 


( \ 


/ 

\ 

0 ips(*s) 

0 

0 

i>s/u k { x snu k ) 

\sed(k) / 


S<E£ k -d(k ) 

\ snu k9 w 

/ 


V’ Uk (xt/ fc ) = 0 i>s/u k (xsnt/fe) 
s&s k , 

SrUk^Q 

1p Uh -{k}[*U k -{k}) = @^u k {^u k ) 

Xk 


11: f fc _ 1 ^(«S fe \9(fc))U{C/ fe ^{fc}} 

12: else 

13: for each S G d(k) do 

14: Compute the product marginalization factors '0s_{/t}( x s-{fe}) = , 0s( x s) 

15: for each S G £ k — d(k) do 

16: if Range of ips is not idempotent wrt 0 then 

17: ’ipsix-s) = V's( x s)' Dom(Xfc ^j for all xg where Qg(xg) is not 0-idempotent 

18: £fc_ 1 t— (£ k \ d(k)) U (5 \ {fc} I 5 G d{k)} 

19: Output p by running Outsideln on FAQ-expression (12) 


5.3 Analysis 

Definition 5.4 (Elimination hypergraph sequence). Algorithm 1 defines a sequence of hypergraphs Hk = 
(Vfc = [fe], £fc), for fc = n, n — 1,..., 1 recursively. The algorithm also defines the sets U k and d(k). Formally, 

def 

let H n = ([n],£ n ) = H, d(n) and U n be defined as in (5) and (6). For fc = n — 1, ..., 1, construct T-L k from 
'Hk+i as follows: 

V fc = Vfc+i - {fc + 1} d(k) = {£ G £ k | fc G S} U k = \J S. 

sea(fe) 


18 










• If ®( fe+1 ) = <g>, then £ k is obtained from £ k +i by removing k + 1 from all edges in £ k+ \. 

• Otherwise, we construct TL k = (14 = [k], £ k ) by defining 

£k = (£k +1 — d(k + 1)) U {Uk +1 — {k + 1}}. 


From these notations, and recalling the AGM-bound definition (3), the following is the main theorem of 
this section. 


Theorem 5.5. Suppose <p is an FAQ query whose hypergraph H = (V,£) has m edges and n vertices. Define 

(13) 


K := [/] U {fc | k> f,® {k) /<g>}. 
For each k £ [n] — K, S £ £k \ d(k), define 

ldem fe (4s) = 

Then, the InsideOut algorithm (Algorithm 1) computes tp in time 


0 if the range ofifs is idempotent w.r.t. 

2 flog 2 |Dom(Xfc)|] otherwise. 


logX-O ^ \U k \-\{Se£ k :SnU k ^9}\-AGM Hk (U k ) 


KkeK 


\ 


+ E l S 'l • IIV’sll + |5| • \\ip s \\ • ldem fc (4 s ) + /(/+ m)\\<p\\ 


k£K 

Se9(fe) 


ket-K 

Sesf\d(k) 


■ ( 14 ) 




Proof. The runtime of InsideOut is the sum of the runtimes of n variable elimination steps plus the time 
needed to report the output at the end. For k = n, ..., 1, the cost of the fcth-elimination step depends on 
whether the fcth variable aggregate is a product aggregate or not: 

• If k > f and = (££), then the runtime is the total time it takes to compute the intermediate factors 
shown in (8), each of which is either 0(|5| • ||^s|| • log IV) or 0(\S\ ■ ||^s|| • ldem(4s, k) ■ log IV). 

• If either k < /, or k > f and the aggregate is a semiring aggregate, then the runtime is dominated 
by the Outsideln algorithm’s runtime to compute the intermediate factor %l)u k _i k \. From Theorem 5.1, 
this runtime is bounded by O (\U k \ ■ |{5 £ £ k : S D U k ^ 0}| • AGM^ fc ([/fe) • log N). 

The final invocation of Outsideln on (12) reports the output (in listing representation) in linear time (modulo 
a log factor) in ||y>||, just like the output phase of Yannakakis algorithm. This is because the participations 
of the factors ifuk > k £ {0,...,/} in the formula ensures that every binding of the backtracking search 
algorithm is part of an output tuple. □ 

In the above discussion and analysis of InsideOut, we eliminated the variables X n ,X n -i,... ,X\ in the 
order given by the input FAQ-expression (1). However, as Example 5.6 shows, there is no reason to force 
InsideOut to follow this particular order. In particular, there might be a different variable ordering for which 
the overall runtime of InsideOut is a lot smaller and the algorithm still works correctly on that ordering. 

Example 5.6 (Effect of variable ordering on runtime). We illustrate InsideOut and the effect of different 
variable orderings on the runtime of InsideOut with an example. Consider the following FAQ query (without 
free variables). 

<p = max max IIE maxmax4{ 1 ,5}'0{2,5}V’{i,3,4}' ! / ; {2,3,6}- 
X\ X 2 xq xq 

X 3 XA 
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Here, the support of each factor determines the variables so we do not write down the parameters of each 
factor for the sake of brevity. Also, in this example we consider D = R + and every input factor has range D. 
Now, a straightforward run of the InsideOut algorithm (Algorithm 1 ) using the variable ordering (Ai,..., Xq) 
evaluates the above expression as follows. 


(spent O(A)-time) 
(re-write) 
(spent 0(A 2 )-time) 
(re-write) 
(spent O(A)-time) 


ip = maxmaxTT V'maxmaxV'{i, 5 }V ; {2,5}V ; {i ) 3,4}^{2,3,6} 

X i X "2 Xq Xq 

X3 X4 

(re-write) = maxmax TT V] max^{i, 5 }V , {2,5}V ; {i ) 3,4} max'0 {2j3i6 } 

X i X "2 Xq Xq 

Xq X 4 

maxmaxrry'max^{i, 5 }V , {2,5}V ; {i ) 3,4}^{2,3} 

X x X "2 Xq 

Xq X 4 

max max IIE V’{l,3 ) 4}V’{2,3} max 4 > {l,5}4 > {2,5} 

X\ X 2 Xq 

Xq X 4 

nz 4 > {1,3A}4’{2,3}4’{1,2} 

Xq X4 

n V’{2,3}V’{1,2} Z ^{1.3,4} 

Xq X 4 

max max n V’{2,3}V'{1,2}V'{1,3} 


max max 

Xi X 2 


max max 

Xi X 2 


(re-write) 

(spent 0 (./V 2 )-time) 
(re-write) 
(spent 0(N 2 )-time) 
(spent O(N)-time) 


max max 

Xi X 2 


nv» 

^3 / \ 

= maxmaxi){ 2 }tii{i ) 2}^{i} 
= max^{i} max 0{ 2 }Vi( li2 } 
= max^ {1} ?/i {1} 

= 4 ’0 




3} 


Note that ip 1 , 2 ( 21 , £ 2 ) = 4’i,2{ x i: x 2 ^ Doml ' X3 ' > ^, for every (n,a: 2 ). The overall runtime of the algorithm is 
0(N 2 ). 

Next, let us see how a slight change in the assumption of the input factors allows us to conclude that the 
aggregate on A 3 is an idempotent product aggregate, and that allows for a different variable ordering to be 
equivalent to the original variable ordering, and that helps reduce the overall runtime of InsideOut. Suppose 
we knew that all input factors have range {0,1}. Then, we can evaluate the query as follows. 


(spent O(A)-time) 
(re-write) 
(re-write) 
(spent O(A)-time) 


ip = maxmaxTT V'maxmaxV’{i 1 5}V’{2,5}^{i,3,4}V , {2,3,6} 

2-1 X2 Xq Xq 

Xq X4 

(re-write) = maxmaxTT V] maxV’{i, 5 }V’{ 2 , 5 }V’{i, 3 , 4 } maxV>{ 2 , 3 , 6 } 

Xi X2 Xq Xq 

Xq X4 

maxmaxTT V max V’{i, 5 }V’{ 2 , 5 }V’{i, 3 , 4 }V’{ 2 , 3 } 

X 1 X2 Xq 

Xq X4 

max max IIE V’{l,3,4}^{2,3}(max0 {lj5} ^{2,5}) 

X 1 X2 Xq 

Xq X4 

max max n V’{2,3} (max 4’{l,5}4’{2,5}) Z ^{1,3,4} 

X\ X2 Xq 

Xq X4 

max max n ^{ 2 , 3 } (max 0{1,5} V’{ 2 , 5 })^{ 1 , 3 } 
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(re-write) 
(x is acting idempotently) 

(spent O(iV)-time) 
(re-arrange) 
(a crucial re-arrangement) 

(re-arrange) 
(spent O(iV)-time) 
(re-arrange) 
(spent O(iV)-time) 
(spent O(iV)-time) 


max max 

X\ X 2 


n^ 3} n^. 


3} 


max max J]©{2,3} (max^ {lj5} V’{2,5} ) JJV’Ii, 


\ X5 


3} 


= max max ^{ 2 } max^{ 1)5 }^{ 2 ,5} V’q} 

X\ X2 \ Xq J 

= maxmaxmaxi/i{2}Ai5}^{2 5}(’{i} 

X\ X 2 X 5 

= max max max V’m Vd 1,5} V’{2,5} V’ti} 

x 5 II x 2 

= maxmax^{i |5 }^{i} max ip{ 2 ,5} 

X 5 X i X2 

= maxmax^{i >5 }V ; {i}' i / , {5} 

= max^{5} niax^p^}^!} 

= max ^{ 5 )^( 5 } 

Xq 

= ^0 


So in this case the fact that all factors inside the scope of x have ranges which are {0,1}, the idempotent 
elements of x, we are able to reduce the runtime down to O(N). The key is, if we ran InsideOut with the 
variable ordering (X 5 , X±, X 2 , X 3 , X 4 , Xe) then we would have achieved a runtime of O(N). 

The main technical contributions of this paper lie in answering the following two questions: 

1. How do we know which variable orderings are equivalent to the original FAQ-query expression? 

2. How do we find the “best” variable ordering among all equivalent variable orderings in an efficient way 
(i.e. better than brute-force search)? 

The next sections formalize the above two questions. 


5.4 Equivalent variable orderings 

We first formalize the concept of “equivalent variable ordering”: 

Definition 5.7 (EVO(^)). Let ip be an FAQ-query written in the form (1). A tp-equivalent variable ordering 
is a vertex ordering a = (vi, , v n ) of the hypergraph % satisfying the following conditions: 

(a) The set (wi,..., Vf} is exactly F = [/]. In other words, in the (^-equivalent variable ordering, the free 
variables come first (in any order). 

(b) The function tp' defined by 

v>'(xf) := ■ ■ ■ ©i:; } <g> se£ v-s(xs) 

is identical to the function ip no matter what the input factors are. 10 

Let EVO(</?) denote the set of all (^-equivalent variable orderings. 

In many applications, we know a specific class of input factors that are allowed in the corresponding 
FAQ problem. For example, in logic applications the input factors are often {0, l}-valued functions. This 
motivates a stronger definition of EVO. 

10 Here, we assume that the variable domains, the range D, and the aggregates are fixed and known in advance, but the input 
factors are not. 
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Definition 5.8 (EVO(</>, F)). Let T be a class of functions with range D. Let EVO(y:, J 7 ) denote the set of 
all (^-equivalent variable orderings under the promise that all input factors come from J-. In other words, 
the definition of EVO(<^, FF) is the same as that in Definition 5.7, except that we only require p' to be the 
same as p for all input factors belonging to T. 

5.5 FAQ-width of a variable ordering 

We now address the second question of what it means to be the “best” variable ordering. With respect to the 
analysis of InsideOut in Theorem 5.5, naturally the “best” variable ordering is a variable ordering in EVO(</?) 
which minimizes the runtime (14). The EVO constraint makes it trickier to find an optimal variable ordering 
for the general FAQ problem, as opposed to the typical variable elimination in the graphical model, matrix 
computation, and constraint satisfaction domains where all variable orderings are in EVO ((/?). To illustrate 
this point, Section E in the appendix presents two examples (namely MCM and DFT) where minimizing 
(14) is easy. These examples explain two entries in the summary Table 1. 

In general, however, without knowing a bit more about the structure of the problem it is hard(er) to 
derive a general result on how to find a variable ordering to minimize (14). In particular, in these examples 
we have been lucky that every permutation of the variable aggregates yields an expression that is equivalent 
to the original FAQ expression (1). This is because in these examples there is only one variable aggregate 
which is the + operator. The second thing that helps with these examples is that we know (a good estimate 
of) the sizes of the indicator projections; so we can plug them in expression (14). In general, these sizes are 
highly instance-dependent. 

Minimizing (14) is a bit unwieldy, thus we slightly relax the runtime analysis of InsideOut to obtain a 
better behaved expression than that in (14). A proof of the following is in Appendix D. 

Proposition 5.9. Following notations defined in Theorem 5.5, the InsideOut algorithm (Algorithm 1) com¬ 
pute ip in time 



/ 

i \ 

\ 

log N ■ O 

n 

nm+ ldem(^ 5 ,fc) J 



\ 

V k£K J 

Se£ k -d(k) 

/ 


In addition to to the term ||^|| required to report the output, the key parameter that controls the 
complexity of the algorithm is the quantity ma ~x.j^K . This quantity is a function of the variable 

ordering X\,... ,X n we chose to write the input query ip on. As aforementioned, there might be multiple 
ways of writing the same FAQ query, leading to wildly different runtimes for InsideOut. We have seen this 
effect in Example 5.6, and have formalized what an “equivalent ordering” means in the previous section. 
The following definition follows naturally: 

Definition 5.10 (Fractional FAQ-width of a variable ordering). Let a be a (^-equivalent variable ordering. 
Define the sequence of hypergraphs 'Hf. along with the sets as in Definition 5.4 (but with respect to a). 
The fractional FAQ width of a variable ordering a is the quantity 

faqw(a) := max {p* H (U°)} . (16) 

From the above definition, we can interpret Proposition 5.9 as basically saying that InsideOut runs in 
time 0(jV faqw ( cr ) + ||(^||), where O hides a factor that is polynomial in query size and logarithmic in data size. 

In the next few sections, we study the main problem of how to select a (^-equivalent variable ordering a 
with the minimum faqw(cr). 

Definition 5.11. The following quantity is called the FAQ-width of an FAQ-query p: 

faqw(y:) ^= f min{faqw(cr) | er g EV0(y:)} 
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In some cases, EVO(c^) consists of all n\ permutations, making it “easy” to solve the above optimization 
problem. Appendix F.l presents several immediate consequences of this easy case. In particular, faqw gener¬ 
alizes fractional hypertree widths (fhtw, see [47]), because the following follows directly from Corollary 4.13: 

Proposition 5.12. Let <p be an FAQ query with hypergraph LL. //EVO(tp) contains all n ! variable orderings, 
then faqw(yj) = fhtw("H). 

In general, however, the question of determining whether a given variable ordering a belongs to EVO((/>) 
is a tricky question to answer formally. In particular, the answer depends on what exactly we meant by a 
variable aggregate, a factor aggregate, the variable domain sizes, and the range D. This difficulty is analogous 
to the situation in logic when one wants to decide whether two (first-order, e.g.) formulas of specific forms 
are logically equivalent [17] 11 . 

Our approach to solving this problem was outlined in Figure 1, and is summarized again as follows. 

• We define a class of variable orderings for a given input FAQ-query ip. This class will be precisely the 
set of linear extensions LinEx(P) of a partially ordered set (poset) on variables called the precedence 
poset P. The precedence poset is defined on a tree called the expression tree of the input query p. The 
expression tree can be constructed in polynomial time in query complexity. 

• We show that every variable ordering in LinEx(P) is <p-equi valent. This is the “soundness” of LinEx(P). 

• We define a combinatorial notion called component-wise equivalence , which is a relation between pairs of 
variable orderings. We show that component-wise equivalence preserves EVO-membership and preserves 

faqw. 

• We show that every variable ordering in EVO is component-wise equivalent to some ordering in 
LinEx(P). This is called “strong completeness” of LinEx(P). In particular, by tracing component-wise 
equivalent variable orderings starting from LinEx(P), one can list all of EVO in exponential time and 
find the ordering that minimizes faqw. (Moreover, by directly applying the definition of component-wise 
equivalence, we can check whether a given variable ordering is in EVO in polynomial time.) 

• However, we can do better. We prove that LinEx(P) variable orderings is all we need to consider, 
because every (^-equivalent variable ordering a either belongs to LinEx(P) or faqw(cr) = faqw(7r) for 
some 7r G LinEx(P). This shows the “completeness” of LinEx(P) as far as the width is concerned. 

The completeness result rests on the assumption that different variable aggregates do not commute. 
(Even this simple statement needs clarification, which is done in Proposition 6.6.) Because the final result is 
a bit technically involved, we present our results incrementally in several stages, by relaxing one assumption 
at a time. Each time an assumption is relaxed, a couple of ideas are introduced to deal with the relaxation. 
It should be noted, however, that in the end there is only one theorem and one algorithm. 

In Appendix F.l we describe the above steps when applied to FAQ-SS without free variables (i.e. 
SumProd), which is the case when determining EVO(</?) is trivial. Appendix F.2 covers a simple but non¬ 
trivial case when <p has two blocks of semiring aggregates. The reader who would like to read at a slower 
pace can start with those two sections in the appendix, where we also connect our results to known results 
in PGM, joins, Yannakakis algorithm, and #CQ. 

In Section 6.1, we present our solution for the case when there is an arbitrary number of semiring 
aggregates but no product aggregates. This is when the idea of an expression tree is introduced. Finally, in 
Sections 6.2 and 6.3 we cover the most general cases. 

11 We thank Balder ten Cate for pointing out to us the essence of the difficulty and the reference. 
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6 Characterizing equivalent variable orderings 

6.1 FAQ with only semiring aggregates 

This section presents the characterization of EVO results for FAQ where every variable aggregate forms a 
semiring with the product aggregate. (Note that there can be an arbitrary number of different types of 
semirings.) In particular, we consider the FAQ-query of the form 

¥>(*[/]) = ®i f f + + l ) ---®^®sesM*s) (17) 

where (D, ©W, ®) is a semiring for every i > / (i.e. the set I\ as defined in (13) is [n]). 

The main aim in this section is to illustrate the key technical idea of the expression tree. The expression 
tree defines the precedence poset. One key component of our completeness results is the notion of component¬ 
wise equivalence between two variable orderings. Component-wise equivalence preserves FAQ-width and 
(^-equivalence. We will show two important facts about the precedence poset: 

• (Soundness) Every linear extension of the precedence poset is a (^-equivalent variable ordering. 

• (Completeness) Every (^-equivalent variable ordering is component-wise equivalent to (hence has the 
same FAQ-width as) some linear extension of the precedence poset. Therefore, EVO(<^>) is com¬ 
pletely characterized using component-wise equivalence and the precedence poset. Moreover, to com¬ 
pute/approximate faqw(<p), we only need to consider linear extensions of the precedence poset. 

In Section 7, we will use the structure of the expression tree to compute a variable ordering to approximate 
faqw(</j), using an approximation algorithm for fhtw as a blackbox. Thus, the expression tree is crucial in 
guiding the construction of a good variable ordering. 

6.1.1 Expression tree and and precedence poset 

The expression tree is defined on a sequence of tagged variables along with a hypergraph. In such a sequence, 
every vertex i (or equivalently variable Xf) is tagged with its corresponding operator ®M; or, if the variable 
is a free variable then its tag is free. Given a sequence <r of tagged variables, a tag block is a maximal 
subsequence of consecutive variables in cr with the same tag. The first tag block of a sequence a of tagged 
variables is the longest prefix of er consisting of variables of the same tag. 

Definition 6.1. (Expression tree) The expression tree for <p is a rooted tree P. Every node of the tree is a 
set of variables. We construct the expression tree using two steps: the compartmentalization step and the 
compression step. In the compartmentalization step, we construct the expression tree based on the connected 
component structures of the FAQ-query relative to the hypergraph structure 7 ~L. In the compression step we 
collapse the tree to make it shorter whenever possible. 

Compartmentalization. In this step, initially we start off with the sequence of variables with their 
corresponding tags exactly as written in (17). In particular, the sequence starts with / free variables (whose 
tags are ‘free’), and then the i’th variable with tag ©6) for i = f + 1,..., n. For technical reasons, we add a 
dummy variable Xq to the beginning of the sequence with a free tag too. So the sequence we start off with 
is the following 

a = ((A'o, ‘free’),..., (. X f , ‘free’), (X /+1 , ..., (X n , ©<">)). 

The vertex Xq is an isolated vertex of the hypergraph 77. Now given a tagged variable sequence a and a 
hypergraph 77, we build the tree by constructing a node L containing all variables in the first tag block L of 
a. This node L will be the root of the expression tree. (The effect of the dummy variable Xq is that, even if 
the original query has no free variable, the first block L still has Xq in it.) If L contains all variables already, 
then naturally the tree has only one node. Otherwise, for each connected component C = (V(C),£(C)) of 
77 — L we construct a sequence crq of tagged variables by listing all variables in V(C) in exactly the same 
relative order as they appeared in a. From the sequence ac and the hypergraph C, we recursively construct 
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the expression tree Pc- Finally, we connect all the roots of (sub)expression trees Pc to the node L. This 
completes the compartmentalization step. After everything is done, we also remove the dummy variable Xq 
from the expression tree P. If originally there was no free variable, the tree has an empty root node and the 
subtrees correspond to the connected components of H. (See Example 6.2 and Figure 2.) 

Compression. Now, in the expression tree P that resulted from the compartmentalization step, as long 
as there is still a node L whose tag is the same as a child node L' of the tree P, we merge the child into L; 
namely, we set L := L U L' , remove L' , and connect all subtrees under L' to become subtrees of L. Repeat 
this step until no further merging is possible. (See Figure 3.) 

Note that the compression step can make some nodes L larger and the final tree T shorter than the 
tree that resulted from the compartmentalization step alone. This step is crucial for getting the correct 
expression tree. If <p is an instance of FAQ-SS, then P is a tree of depth < 1 where the root node contains 
all free variables (if any) and its children (if any) contain the rest of the variables. 

Example 6.2 (Intuition behind expression tree). Consider the following FAQ query that has two different 
semiring aggregates (Y2 and max) and no free variables: 

</; = V' V' max V' V' maxmax'i/’i2V , i35V ; i4'!/ , 246V'27V’37- 

X3 Xq X’j 

X\ X2 X4 Xq 

(tj) 12 above denotes a factor whose support is {Xi,X 2 }, and so on.) The hypergraph of <p is depicted in 
Figure 2a. The compartmentalization step of the construction of the expression tree is depicted in Figures 2b 
through 2d. Figures 3a and 3b depict the compression step. The final expression tree appears on the right 
of Figure 3b. 


Definition 6.3. (Precedence poset) The expression tree defines a partial order on the variables. Abusing 
notation we will also use P to denote the partial order ([n], A). In this poset, u -< v whenever u £ L, v £ L ', 
and L' is a (strict) descendant of L in the expression tree P. In particular, variables in the same node of the 
expression tree are not comparable in this partial order. We call this partial order the precedence poset. Let 
LinEx(P) denote the set of all linear extensions of the poset. 


6.1.2 What makes two aggregates different? 

Before proving soundness and completeness, we need a small technical detour. Recall that aggregates are 
simply binary operators under D. 

Definition 6.4. (Different aggregates) Two aggregates © and © are different if there is a pair a, b £ D such 
that 

a © b ± a®b. 


Otherwise they are identical. 

Definition 6.5. (Commutative aggregates) Two aggregates © and ffi are said to be commutative if for every 
a, b,c,d £ D, we have ( a ffi 6)®(c ffi d) = (affic) ffi (b®d). 


Recall that in FAQ, all semirings share the same 0 (since one ‘0’ must annihilate the rest). Thus, if we 
select a = d = 0 in the above equality, then we obtain 6ffic = b ffi c, for every b,c £ D. This means 


Proposition 6.6. Commutative aggregates are identical aggregates. Conversely, non-commutative aggre¬ 
gates are different aggregates. 

(Note that it is possible for semantically different aggregates to be identical under D by accident. For 
example, in the {0,1} domain min and x are identical.) In this paper, we assume that two different aggregates 
in the input FAQ-expression are not functionally identical. Recall that we also assumed |Dom(JQ)| > 2 for 
every i £ [n] (otherwise the aggregate on that X, is trivial and can be ignored). 
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(a) If = Y' max Y' Y' max max ip 12 1/J135 1/114 4> 246 V’27 4>37 

ZZ ' X3 Zz ^ XQ Xj 

x\ X2 X4 xq 





Figure 2: The compartmentalization step of the expression tree from Example 6.2, depicted using colors. 
(2a) depicts the hypergraph of the FAQ query: tp = V'' V'' maxV'' V'' max max V’i2'0i35' ! /’i4^246V’27^/’37- 

X3 Xq Xj 

Xi X2 X4 Xq 

For simplicity, the dummy free variable X 0 is ignored in this example. (2b) shows the first part of the 
compartmentalization step, where the first tag block is L = {1,2}. After removing L, the query breaks 
into two connected components: the red and the blue. The expression tree at this point appears on the 
right. Each color is used to denote correspondence between parts of the query expression, hypergraph, and 
expression tree having that color. (2c) shows how to apply compartmentalization recursively on the red 
component, while (2d) shows the blue component compartmentalization. 
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Figure 3: The compression step of the expression tree from Example 6.2. Recall that Figure (2d) (right) 
depicted the expression tree at the end of the compartmentalization step. (3a) shows a compression where 
node {7} is merged into its parent {3} (since they both have the same tag “max”). (3b) shows another 
compression where {4} is merged into {1,2}. Since no further compression is possible, (3b) (right) depicts 
the final expression tree. 
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Proposition 6.7. Suppose ® and © are different binary operators (under the domain D), then for every 
i,j £ [n], there is a function </>ij : Dom(Al,;) x Dom(X,) — > D for which 

© © fij(x.i,Xj) ± © ® (j>ij{xi, Xj). (18) 

XiGDom(Xi) XjGDom(Xj) Xj^Dom(Xj) XiEDom(.X’.j) 


Proof. From the analysis above, the two operators do not commute. Hence, there are four members a, b, c, d £ 
D so that (a © 6)©(c © d) (a©c) © (6©d). Fix arbitrary elements xj xj £ Dom(Aj) and xj / € 

Dom(Xj). Define 


Then, ©,. © x . 0,j {x,. xj ) 



a 

if (xi,xj) = 

'■ (xj-. 

,xj) 



b 

if (xi,Xj) = 

-- (xj , 

>xj) 


II 

j? 

c 

if (xi,Xj) = 

-- (xj-. 

aj) 

(19) 


d 

if (xi,Xj) = 

: (xj-. 




0 

otherwise. 




(offic) © (b®d) (a © i 

b)©(c © d) = 


©Xi <t>ij(Xi,Xj). 

□ 


Given i,j £ [n], xj xj £ Dom(Xj), xj xj £ Dom(Xj), we define an ‘identity’ function : 
Dom(Xi) x Dom(X,) —>• D as follows. 



1 if (xi, xj) = (xj, xj) or (x u Xj) = (xj, xj) 

0 otherwise. 


( 20 ) 


We will use both (f>ij and <pjj in the proofs below. 


6.1.3 Soundness and completeness 

We are now fully equipped to show that LinEx(P) is sound. 

Theorem 6.8 (LinEx(P) C EVO(</?)). Every linear extension of the precedence poset P is p-equivalent. 

Proof. Let P be the expression tree constructed using only the compartnrentalization step. This expression 
tree already defines a poset on variables. We will first show that every linear extension of this compartmen- 
talization poset is (^-equivalent. We prove this claim by induction on the number of tag blocks of the input 
sequence. Let a denote the input sequence of tagged variables with input hypergraph Pi. 

In the base case a has only one tag block. All variables in the sequence belong to the same node of the 
compartnrentalization expression tree. This means every permutation of variables is a linear extension of 
the poset, which is what we expect because every permutation is <p-equi valent. 

In the inductive step, suppose a has at least two tag blocks with the first block being the set L of variables. 
Then, each sub-sequence gc for each connected component of PL — L defines an FAQ-expression pc on the 
set of conditional factors {ifs(' I x l) I S £ £ /\ S (1 V(C) 0}. When we condition on the first L variables, 
the expression p(- \ xl) completely factorizes into a product of the FAQ-expressions ipc- (Another way to 
put this is that ip can be written as a series of aggregates on variables in L , with a product of pc inside.) By 
induction, every linear extension of the compartnrentalization poset for gq is (^e-equivalent. Those linear 
extensions can be put together in an arbitrary interleaving way to form tp(- | x/,). This observation completes 
the proof of the claim, because every linear extension of the expression poset for ip consists of variables in 
L , followed by arbitrary interleavings of linear extensions of the expression posets for the pc- 

Next, we consider the expression tree after the compression step. We show that every linear extension 
of the final precedence poset is (^-equivalent by induction on the number of merges of a child node L' to 
a parent node L (which both must have the same tag). To see this, we can take a linear extension g of 
the expression tree before the merge where all variables in L and in V are consecutive in g. Then, because 
all variables in L U L' have the same tag, we can permute them in any way and still obtain a (^-equivalent 
variable ordering. □ 
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It would have been nice if every (^-equivalent variable ordering is a linear extension of P. Unfortunately 
this is not true. Consider the following FAQ-query 

max max E ^15V’25V’13V’24, 

X\ X2 X5 

where all factors have range R + so that all variables are semiring variables. In this case the expression tree 
has three nodes: one empty root, a node containing {1,2,5} and two children {3} and {4}. The linear 
extensions will enforce that Xi, X 2 , X$ come before X 3 and X 4 ,. However, it is easy to see that we can 
rewrite ip as follows. 


= 


EE max max E ^15^25^13^24 

Xi X2 X5 

EEE max max ip 15 ^ 25 ^ 13^24 

X 5 X\ X2 

e(e maxV’ 15^13 ) ' T. max ip 25^24 


X 5 \ Xi 




) 


This means when conditioned on X 5 , the expression factorizes and we can multiply them back allowing for 
4 to come before 1, or for 3 to come before 2; namely, 


= EE max E max ^15^25^13^24 

£5 X2 X\ 

= y y max y max ipi 5 ip25 ^13^24- 

z — J z — J r r.n z —' t.a 


X 3 z ' X 4 
X 5 Xi X 2 


However, it can be verified that 


faqw(5,1, 3,2,4) = faqw(5, 2,4,1, 3) = faqw(cr) 

for any a £ LinEx(P) where 5 comes first in < 7 , and where P is the expression tree of the query. Note that 
we can take the linear extensions of the factorized components {1,3} and {2,4} and interleave them in any 
way, as long as we still respect their relative order within each component. However, these interleavings do 
not add anything of value as far as the faqw is concerned. 

Another way to think about the above example is that we could have arbitrarily selected one variable 
in the first tag block of ip, construct a compartmentalization expression tree with that variable as the root. 
(One variable at a time instead of one tag block at a time.) Then, by the same reasoning we used in the 
proof of Theorem 6 . 8 , every linear extension of this ‘variable-wise’ poset is tp-equivalent. However, this idea 
alone also does not work because it will forbid the selection of X 5 as the first variable in the above example. 
Thus, it is crucial that we construct the compressed expression tree first, to determine which variable can 
come first in a (^-equivalent variable ordering. Ultimately, the set LinEx(P) gives us a canonical way of listing 
the variable orderings that really matter in evaluating p. 

In what follows, we implement the above informal discussion and intuition by showing that every ip- 
equivalent variable ordering has the same width as some ordering in LinEx((/j). The following lemma says 
that the expression tree indeed gives us a complete list of variables that can occur first (after the free 
variables) in any ^-equivalent ordering. 

Lemma 6.9. For every variable ordering 7 r = (u\,... ,u n ) £ EVO((/?), the variable rt/+i must belong to a 
child node of the root of the expression tree. 12 

12 Recall that {«i,... ,uj} are the free variables, which are located in the root of the expression tree. And, if / = 0 then the 
root of the expression tree is empty. 
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Proof. Let p 77 denote the function defined by the FAQ-query with it as the variable ordering (over the same 
input factors as p). Our aim is to show that if the conclusion of the lemma does not hold then there exist 
input factors ips for which p 77 ^ p. 

Suppose for the sake of contradiction that uf +1 belongs to a node L whose parent is L p , and L p is not 
the root of the expression tree P. Let L a denote union of all the (strict) ancestors of L p in the expression 
tree. From the construction of the expression tree, the vertices in the set U L p belong to the same 

connected component of the graph TL — L a . Let io := u/+i, ii,..., ik G L p be the shortest path in the 
Gaifman graph of H — L a from m/+i to L p . Then, the vertices ii ,..., ik -1 do not belong to L p U L a ; and, 
there are distinct hyperedges S±,...,Sk of H such that {ij-i, ij} C Sj for all j G [k]. 

For each variable i G {«o, • • • ,4}, we fix two arbitrary values x\ / x 2 t G Dom(X^) and for each t' G 
[n] \ {fo, ■ ■ ■, ik}, we fix one arbitrary value e y G Dom(Xy). For the sake of brevity, denote © = ©^ and 
© = ©^ fc ) 

Now, we construct an input set of factors ipg, S G £, for which p 77 ^ p. 

• Define the factor tps k by 


V’sJxsJ 


{ 4 >i k - 1 i k (xi k _ 1 ,x ik ) if xy = ey for all i' 

e S k \ {*o, • ■ ■ ,ik} 
0 otherwise, 


where 4>i k _ 1 i k is the function defined in (19). 
• For every j G [k — 1], define 


lM x S,-) 



K-i 




0 


if xy = ey for all P 

G Sj \ {*o, • ■ • ,4} 
otherwise, 


where is the ‘identity’ function defined in (20). (Think of these factors as little 2x2 identity 

matrices.) 


• Finally, for every S' G £ \ {Si,..., Sk}, define 


if S' ( X S') 


1 if xy = ey for all l' G S' \ {«o, ■ • ■, 4} 
0 otherwise. 


Because io is the first in 7 r after the free variables, (/©ei,..., e/) will evaluate to the left hand side of (18). 
(Imagine running the insideOut algorithm to evaluate pP.) Next, to get a contradiction, we pick an ordering 
a G LinEx(P) such that io precedes all variables that are at the same or lower level in the expression tree. 
By Theorem 6 . 8 , we know that cr G EVO(y?). In a , ik precedes io which in turn precedes all of {© ..., 4_i}. 
Hence, when we compute p>(ei ,..., e/) using the ordering er (and the InsideOut algorithm) we get the right 
hand side of (18). Thus, pp ^ p> as desired. □ 

The next definition realizes the intuition that if we construct the precedence tree using the one-variable- 
at-a-time strategy (as opposed to the one-tag-block-at-a-time strategy), then we can interleave the linear 
extensions of connected components arbitrarily and still get a variable ordering which is (^-equivalent with 
the same FAQ-width. Since the interleaving can happen at any level, the definition is inductive. 

Definition 6.10. (Component-wise equivalence) Let p be an FAQ-query where all variable aggregates are 
semiring aggregates. Let cr = (v\,, v n ) G EVO(y>) be a variable ordering. Let it = (iq,..., u n ) be another 
variable ordering with {rti,..., up} = F. Then, 7 r is said to be component-wise equivalent (or CW-equivalent) 
to a if and only if: 
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• either n = 1 


• or H has > 2 connected components, and for each connected component C = (V(C'), £ (C)) of H, nc 
is CW-equivalent to ac, where ac (respectively, 7 rc<) is the variable ordering of V(C) that is consistent 
with a (respectively, n), 

• or Mi = vi, and for each connected component C = (V(C),£(C)) of R — {«i}, nc is CW-equivalent to 
ac, where ac (respectively, nc) is the ordering of V(C) that is consistent with er (respectively, n). 

Given a set of variable orderings A C EVO(tyj), we use CWE(A) to denote the set of all variable orderings 
that are CW-equivalent to some variable ordering in A. 

Proposition 6 . 11 . Let n be a variable ordering that is CW -equivalent to a £ EVO(tp). Then, we have 
n £ EVO((/?) and faqw(u) = faqw( 7 r). 

Proof. Due to the fact that the different (conditional) connected components do not interact, when we run 
InsideOut on a and 7 r for all k £ [n], we have U% = UJf. This observation proves both claims. □ 

The following theorem shows the completeness part. 

Theorem 6.12 (EV 0 (t/ 9 ) = CWE(LinEx(P))). A variable ordering a is ip-equivalent if and only if it is 
CW -equivalent to some linear extension of the precedence poset P. 

Proof. We only need to show that EVO(</?) C CWE(LinEx(P)) because the reverse containment follows from 
Proposition 6.11 and Theorem 6 . 8 . Also, without loss of generality we can assume that the root of the 
expression tree F is empty, and it has one child node L. (If there were different connected components, they 
can interleave arbitrarily and we prove each of them separately.) 

Fix an arbitrary a = (vi,...,v n ) £ EVO(c^). Then Vi £ L by Lemma 6.9. For each connected 
component C of H — {iq}, define a sub-query pc with variable ordering ac on the conditional factors 
{ipsi■ I x Vl ) | S £ £ A S ft V(C) 7 ^ 0} , where ac is the subsequence of a obtained by picking out vertices in 
V(C'). Let Pc be the precedence poset of the expression tree for pc- By induction on the number of variables, 
we know ac £ EVO (pc) Q CWE(LinEx(Pc)). Hence, there exists nc £ LinEx(Pc) that is CW-equivalent to 
VC- 

The expression tree Pc for pc consists of an empty root (with ‘free’ tag). This root has only one child 
node Lc- The subtree rooted at Lc is called an Lc- subtree. Now, if we attach all the roots of all the 
Lc-subtrees together, we will create a tree whose root R is the union of the Lc- Then, we add Vi to the 
root R and add an empty parent to R then we will obtain exactly the expression tree P. 

From this observation, we can pick a variable ordering n that is consistent with all nc such that n starts 
with vi, followed by variables in L — {ui}. It follows that n £ LinEx(P) and n is CW-equivalent to a. □ 

Now, we give an example of component-wise equivalence (Definition 6.10) and the role it plays in com¬ 
pleteness (Theorem 6.12). 

Example 6.13 (Component-wise equivalence and completeness). Consider the following FAQ query with 
two different semiring aggregates (X) and max) and three variables, all are bound. 

p = ^2 max X] 3- 

Xl X 3 


(ipij above denotes an input factor whose support is {Xi,Xj}.) For this query, 

EVO( V ) = {(1,2,3), (1,3,2), (3,1,2)}. 

Ignoring the dummy free variable Xo, the expression tree of p consists of a root with the tag ‘JO’ containing 
the variables {X\,X 3 } and a single child node with tag ‘max’ containing {X 2 }. Therefore, LinEx(P) = 
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{(1, 3, 2), (3,1, 2)} C EVO (ip), as suggested by Theorem 6 . 8 . Note that the original ordering (1,2,3) ^ 
LinEx(P). However, (1,2,3) is component-wise equivalent to (1,3,2) (See Definition 6.10). Therefore, 

CWE(LinEx(P)) = {(1,3,2), ( 1 ,2,3), (3,1,2)} = EVO fa), 

just as predicted by Theorem 6.12. Moreover, by Proposition 6.11, faqw((l, 2, 3)) = faqw((l, 3, 2)) = 1. 
Therefore, when searching for the variable ordering with the best FAQ-width, the ordering (1, 2, 3) is redun¬ 
dant, hence it is sufficient to consider LinEx(P), just as suggested by Corollary 6.14. 

Proposition 6.11 and Theorem 6.12 imply the following result, which is precisely what we need to ap¬ 
proximate faqw(yj) in Section 7.1. 

Corollary 6.14. faqw(</?) = min{faqw(cr) | a £ LinEx(P)} . 


6.2 FAQ with an inner FAQ-formula closed under idempotent elements 

Now, we generalize the results of Section 6.1 to FAQ-expressions that have “idempotent” product aggregates 
(in addition to semiring aggregates). In particular, this section considers FAQ-expressions of the following 
special form (that is still more general than that of Section 6.1). 


6.2.1 Problem specification 

Before defining the special case of FAQ that we solve in this section, we need to define and clarify a couple 
of concepts. Let Dj C D be a set of idempotent elements of <g> under the domain D. A variable aggregate 
® is said to be closed under D/ if a ® b £ D/ whenever both a and b belong to D/. (Recall that ® may or 
may not be the same as <g>). In the FAQ context, the two elements 0 and 1 are idempotent elements of ®. 
Hence, the canonical example is D/ = (0,1}: max and x are closed under {0,1}, V and A are closed under 
{true, false}, U and fl are closed under {2 U , 0}, etc. If D is a matrix domain then there might be more than 
two idempotent elements under matrix multiplication. 

Note that if a semiring aggregate ® is closed under D/, then it is also closed under D/ U {0}; however, 
it is not necessarily closed under D/ U {1}. On the other hand, a product aggregate is always closed under 
D/U{0,1). 

Definition 6.15 (Identical aggregates under a subset D' C D). Given a set D' C D and two aggregates 
®,® (that are not necessarily closed under D'), ®,® are said to be identical under D' if for all a,b £ D', 
we have 

a ffi b = a®6. 

Note that two aggregates might be identical under D' but not under D (by accident). 

In this section, we consider an FAQ-query ip of the following form: 


where 


¥>(*[/]) = ©i't? • • • ©£? ©S^ 0 • • • ©£? <8>s ef Mxs), 


( 21 ) 


• All input factors have range Dj, such that D/ is a set of idempotent elements of <S>, {0,1} c D/, and 
® is closed under D/. 13 

• 0 < £ < n — / is an integer, 

• For f + l<i<f + £, ©« 

is an arbitrary semiring aggregate, 

• For f + £+ l<i<n, ©« 

is closed under D j. (It could be either a product or semiring aggregate.) 

13 If Dj is the set of all idempotent elements of (g), then (g) is closed under Dj, because for any two elements a, b £ D/, we 
have (a<g>b)<g>(a®b) = a<g)(b<g)b)(g)a = a<g)b<g)a = (a® a) ®b = a<g>b. 
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• For every / + 1 < i, j < n such that ©^ and © <j,) are semiring aggregates, if ©^ and © ! ,) are not 
identical under D, then they are not identical under D/. 


A compact way to write the above FAQ expression is 


<p( x f) = ©i^ ' •' ©*,+«' V 


+/-M0 


where <p' is a general FAQ expression whose aggregates are closed under D/. This special case of FAQ 
captures the QCQ and ffQCQ instances from Example A.20 and Example 1.3, both of which can be written 
in the above form. 


Example 6.16 (^QCQ-revisited). In ffQCQ, tp' is basically a QCQ formula. In particular, although the 
input factors have range {0,1}, the output is computed under the range D = N. Hence by choosing 
D / = {0,1}, we can compute the “QCQ-part” of ffQCQ over D/ exploiting the fact that all product 
aggregates are idempotent (as in Definition 5.2), and the “#-part” of #QCQ over the full domain D while 
enjoying the fact that there are no product aggregates. 

More specifically, in #QCQ we have two aggregates (max, x) that are closed under D/ (those are equiva¬ 
lent to the logical V, A), and one aggregate + which is not (e.g. 1 + 1 ^ {0,1}). Notice that because of this, + 
cannot be identical under D/ to either max or x (even if we use any different arithmetic interpretation of the 
logical V, A other than max, x). Hence, #QCQ satisfies the last condition above in the problem formulation. 

In this special case of FAQ, as long as the variable ordering lists X[y + ^j first (in any order), then all 
product aggregates are idempotent (see Definition 5.2). Our aim is to find a good variable ordering for tp in 
the set EVO(<p, J-^D/)), where J-”(D/) is the set of factors whose range is D/ (See Definition 5.8). Notice that 
in this context, EVO(</?, ,F(D/)) is a stronger notion and harder to deal with than EVO(^). In particular, a 
variable ordering a can still be in EVO(</), J-^D/)) even if the output evaluates differently under a for some 
input factors whose range is D, as long as those factors don’t have range D/ C D. 


6.2.2 Non-commutative aggregates vs. non-identical aggregates 

From the discussion of Section 6.1.2, a necessary condition for two variable aggregates (possibly products) 
to be commutative under D/ is that for all a, b, c, d £ D/, we have 

(a ® 6 )ffi(c © d) = (affic) ffi ( 6 ffid). 

(Notice that ©, © are not necessarily closed under Dj, and that is why the above condition is not sufficient). 
We recognize two cases: 

• If both ffi and ffi are semiring aggregates, then by selecting a = d = 0 (since 0 £ D/), we obtain 
foffic = b ffi c for every b, c £ D/. Hence, no two semiring aggregates commute under D/ unless they are 
identical under D/. However, our problem formulation requires any two semiring aggregates that are 
identical under D/ to be identical under D as well. 

• If exactly one of the two aggregates is a product, say ffi = ffi, then by selecting a = d = 0 and b = c = 1, 
we get a contradiction (1 = 0). Hence, ffi, ffi do not commute under D/. 

We infer the following variant of Proposition 6.7: (Notice that the range of <pij is now D/ instead of D.) 

Proposition 6.17. Suppose that ffi and ffi are either semiring aggregates that are not identical under D or 
one of them is a product aggregate (while the other is not). Then for every i,j £ [n], there is a function 
(f>ij : Dom(JQ) x Dom(X,-) —> D/ for which 

©, iGDom(Xj) ©* j eDom(.Yj ) ( Xi > X 3 ) jGDom(Xj) © XiGDom(Xi) 0*7 i ) • (22) 
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Proof. If both ® and © are semiring aggregates, then (fij is defined exactly the same as in the proof of 
Proposition 6.7. If exactly one aggregate is a product, say ffi = ®, then faj is defined as follows. Fix 
arbitrarily elements x\ ^ x\ £ Dom(Xj), x j £ Dom(X,) such that is not the last in Dom(X,) (i.e. there 
is x | £ Dom(Xj) such that x ^ > a;]). Define 


ij ( x i > x j) — 


1 if (xi = x\ A Xj < a;]) or (xi = x\ A Xj > xj) 


0 otherwise. 


(23) 


Then, 


©X, ©Xj ( t >i j ( X ii x 3 ) — 0 7 ^ 1 — ©X, ©Xj < t > iji x i> x j)- 


□ 


6.2.3 Expression tree and precedence poset 

The approach in this section and the next mirrors that of Section 6.1. However, dealing with the product 
aggregates (under the <p' part of the query) requires extra care. In particular, the corresponding variables 
do not play a role in determining the connected components when we construct the expression tree. (They 
do not belong to the set K defined by (13).) And, they do not contribute directly to the width faqw(cr), 
because faqw(cr) is defined only on Uk for k £ K. While our algorithm was designed for FAQ, it might be 
helpful if the reader uses ffQCQ as the running example for this section. 

Definition 6.18 (Expression tree). The expression tree for tp is a rooted tree P. Every node of the tree 
is a set of variables, and the tree is constructed via compartmentalization and the compression. While the 
compression step remains identical to that in Section 6.1, the compartmentalization step is trickier. 

Compartmentalization. In this step, initially we start off with the sequence of variables with their 
corresponding tags exactly as written in (21). We also apply the same trick of adding a dummy free variable 
Xq as was done in Section 6.1, i.e. we start off with the sequence 

<t = ((*o, “free”), {X u “free”),..., {X f , “free”), (X f+1 , ®^ +1 >),..., (X n ,©<">)) 

and with the hypergraph T-L which has an extra isolated vertex Xq marked with a “free” tag. 

Now given a tagged variable sequence a and a hypergraph H, we build the tree by constructing a node 
L containing the first tag block. Then, we do the following. 

• Let W be the set of all product variables in cr that do not belong to L. 

• For each connected component C = (V(C), £(C)) of H—L—W we construct a hypergraph (V'(C'), £'{C)) 
called extended component of C as follows. We set 

V'(C) = V(<7) |J {u> G W | 3S 1 £ £ where S D V(C) ^ 0 and !»eS} 

£'{C) = {5n V'(<7) | S G £,5n V(<7) ^ 0} 

• After all the extended components are constructed, we construct a special set D of variables called the 
dangling product variable set, which is defined to be 

D := \J {SnW). 

see 

( S\L)CW 

• Now, for each extended component (V'(C), £'{C)), we construct a sequence ac (of tagged variables) by 
listing all variables in V'{C) in exactly the same relative order they appeared in a. From the sequence 
ac and the hypergraph (V'{C),£'(C)) we recursively construct the expression tree Pc- 
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• Finally, we connect all subtrees Pc to the node L. And, we create a node that contains D - all 
dangling product variables - and connect it to L also. 

• The dummy variable Xo is removed after everything is done. This completes the compartmentalization 
step. Note that after compartmentalization all variables in the same node of P have the same tag. 

Compression. Now, in the expression tree P that resulted from the compartmentalization step, as long 
as there is still a node L whose tag is the same as a child node L' of the tree P, we merge the child into L. 
Repeat this step until no further merging is possible. 

Before proceeding with the proof, we present an example to illustrate the computation of the expression 
tree in the presence of product aggregates. 

Example 6.19 (Intuition behind expression tree). Consider the following FAQ query, where all factors have 
range {0,1}. 

tp = max max V' V' TT maxTT max ^ 3^24 Vh 5 t/h 6 ' 026 ' 0257 , 0 i 67 V '278 ■ 

X\ X2 ZzA A Xq a a Xg 
X 3 X 4 Xq X 7 

In this example, Dj = {0,1} while D = N. Both max and x are closed under {0,1}, while + is not. Hence, 
by comparison to the format in (21), / = 0 and £ should be taken as 4. Notice that because max is closed 
under D/ while + is not, max and + are not identical under D/. Hence, the last condition in the problem 
specification is satisfied. Figure 4 illustrates the hypergraph H corresponding to this FAQ instance. 


n 
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-- 
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D t 

5) 

© 

© 


ip = max xi max I2 J2 X3 J2 Xi Y[ X3 max X6 \\ X7 max I8 V , l3V'24V’34bl5bl6b26’/’257V’l67b278 


Figure 4: The hypergraph T~L corresponding to an FAQ instance ip with the factors having the range {0,1}. 


: 1 •, ; : l ( 

XD j i. CD; I KD....®.....® 

2 ! 2 ! 

T,X 3 T, Xi bl3^24^34 ' max I6 Y\x 7 bl6b26bl67 ' 11^7 max xg i >278 


Figure 5: The extended components for the FAQ instance ip from Figure 4. Ignoring the dummy free variable 
X 0 in this case, L = {1, 2} and the set of product nodes that are not contained in L is defined by W = {5, 7}. 
The vertices in each of the three extended component are the ones with a solid circle. A (hyper)edge is drawn 
with a solid line if it is completely contained inside the vertices of the extended component and is drawn 
with a dotted line if it is only partially contained inside the vertices of the extended component. The 
corresponding FAQ instances are shown below the components. 
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Figure 6: Construction of the expression tree for tp from Figure 4. The left panel is the compartmentalization 
(top left being the tree after the first level of recursion and the bottom left being the tree after the second 
and final level of recursion) while the right panel is the compression step. Each node is labeled with the 
corresponding tag in blue (if all the nodes in the tree have the same tag). The dangling node D is marked 
in orange on the top left. The dummy free variable Xq is ignored in this example. 


Because in this example tp consists of a single extended component, we can ignore the dummy free variable 
Xq in the construction of the expression tree. In the compartmentalization step, we try to symbolically 
decompose the problem. In particular, due to the fact that all product aggregates are idempotent, we can 
decompose the above problem gradually as follows. Note that, for brevity the scope of each aggregate is 
the entire expression to its right; if we want to be more specific about the limit of the scope we put a 
sub-expression inside a pair of parentheses. 

V 

= max max V' V' IT max IT max ^13^24^34^15^16^26^257^167^278 

X\ X 2 z -' Z -- A X Xq A x& 

X$ X4 X5 Xy 

= max max V' V' ^13^24^34 TT max IT max ^15^16^26^257^167^278 

X\ X 2 z -' zA A Xq a a Xs 

Xs X4 Xq Xy 

= max max V ^13^24^34 TT 015 max 016026 TT 02570167 max 0278 

X\ X 2 LZ - 1 - Xq Xg 

Xs X4 Xq Xy 

= maxmaxV'V'V’i3'024V’34TT^ 15max ^ 16l / ;26 Tl^ 257 ‘ IT^ 167 ' TI max ^ 278 

X]_ X 2 Xq \ / \ I \ Xq ) 

Xs Xa Xq \ Xy / \ Xy / \ Xy / 

= max max IT max 0278 ) ■ V’ V' 013024034 TT ( 015 IT 0257 • max 0 i 6 026 IT ^ 167 

X\ X 2 \ 2^8 / \ / Xq \ / 

\ Xy / X 3 X4 Xq \ Xy / \ Xy / 

= max max ]T max 0 27 8 ' max 0 i 6 026 IT 0167 ■ Y] Y] 013024034 IT 015 ‘ ]T ^ 257 

X]_ X 2 \ Xs / \ Xq I \ I 

\ Xy / \ Xy / Xs X4 Xq \ Xy / 
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= max max I max 0278 I • I max n 016026V’l67 I ' EE 013024034 • nn 0150257 ) • 

\ X 7 / \ X 7 ) \ X 3 X 4 / \ X 5 X 7 ) 

The set D = {X 5 ,X 7 } is the set of dangling product variables. When conditioned on {x\,X 2 ) the inner 
instance factorizes into a product of four independent FAQ queries. The extended components along with 
the corresponding FAQ factors are shown in Figure 5. The intermediate expression tree after this first level 
of recursion is shown on top left in Figure 6. After recursively constructing the expression trees for each of 
the sub-queries, we can connect them to the root node L = {aq, £ 2 } of the expression tree (as in bottom left 
of Figure 6). This is only the compartmentalization step. Everything is done at a symbolic level, no real 
computation is involved. 

Next, let us illustrate what the compression step does. We re-arrange the above expression for p. 




= max max I max 0278 ) ■ I max 0160260167 ) • EE 013024034 ) • nn ^ 16^257 ) 


= max max I max n V’160260167 ■ n 

X l X2 \ 316 * 


max 0278 


EE 013024034 ) • nn 0150257 ) 


= max max max 016026 

X\ X 2 \ Xq 


= max max max 0i6 026 

xi x 2 xe 


n ^ 7 

n^ar 


n max-0278 ' EE 013024034 • nn 0150257 


n max 0278 ' EE 013024034 • nn 


0150257 • 


Un-raveling an inner aggregate to become an outer aggregate corresponds to a merge of a child node to the 
parent node in the compression step. The final expression tree is shown on right in Figure 6. 


Unlike free and semiring variables, it is possible for the same product variable to appear multiple times 
in the expression tree. In Example 6.19, X 7 occurs three times in three different sub-expressions, and we can 
think of them as three different variables X 7 , X 7 , and X" which all have domains Dom(X 7 ) = Dom(A' 7 ) = 
Dom(X") and rewrite p as: 


p = max max max 0i6'i/’26 

xi X2 xe 


HE 67 

. x 7 



EE ^ 13^24^34 



Luckily, the distribution of product variables (with their multiple copies) in the expression tree exhibits some 
nice properties, some of which do not even hold for semiring variables. Those nice properties enable us to 
take care of multiple copies of product variables. 

In particular, given integers i,j G [n] — [/],* < j, where ®M = ®0) is a semiring aggregate, it is possible 
for Xj to be a strict ancestor of X t in the expression tree, due to the compression step. {X 3 is a strict 
ancestor of Xj if Xj G L , Xj G L ', and L is a strict ancestor of L' in the expression tree.) For example, 
consider the following FAQ-query. 


0 - ©, Cl © J2 ® a - 3 © a . 4 ^ 12 ^ 23 ^ 14 - 

The expression tree roughly corresponds to the following re-writing of the query: 

0 = Sri ® I4 [0Z2 (©:T3 ^12^23^14)] • 

Although X 3 comes before X 4 in the original expression, X 4 is now a strict ancestor of X 3 in the expression 
tree. However, the following lemma says that the above scenario is impossible if = ©W is a product 
aggregate. The lemma also allows us to show that the comparison relation formed by the expression tree is 
indeed a partial order. 
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Lemma 6.20. The expression tree satisfies the following properties. 

(a) For any i € [n] — [/] such that ®W = ©, no copy of Xi is a strict ancestor of another copy of Xi. 

(b) For any i < j £ [n] — [/] such that ©W = ©O') = ©, no copy of Xj is a strict ancestor of any copy of 
X.i. 

Proof. The proof is by induction on the number of tag blocks. In the base case where there is only one tag 
block, the expression tree has only one node and the lemma follows trivially. 

In the inductive step, suppose a has at least two tag blocks with L being the first tag block. Then, each 
sub-sequence ac for each extended component (V'(C'), £'{C)) defines an FAQ-expression (pc- The expression 
pc is on the conditional factors tfsi' I x i) for each S £ £ such that S'nV(C') 7 ^ 0. For the dangling component 
D , the expression pp> is defined similarly. The (compressed) expression tree for p can be constructed by 
taking all the (compressed) expression trees for all pc (and Pd), connecting all their roots to the new root 
L, and then merging into L the children that have the same tag. Consider the following two cases. 

Case 1. In the case where the tag of L is a semiring tag, consider two different expressions pc,PC' for 
two extended components C, C. Although C and C' could contain copies of the same product variable(s), 
we can only merge the root of the expression tree for pc (or pc) into L when this root contains the same 
semiring tag as that of L. Therefore even after merging, no (copy of a) product variable in C can be a strict 
ancestor of any (copy of a) product variable in C'. Assuming the lemma holds for the expression trees of all 
pc, it will hold for the expression tree of p. 

Case 2. If the tag of L is we claim that there cannot be more than one extended component C. 
Roughly, this is because in the construction of the expression tree, product variables are not taken into 
account while determining connected components. Hence, removing them does not increase the number of 
extended components. Assuming the lemma holds for pc, it will hold for p. 

Let us elaborate more on the above claim. Because we are adding a dummy free variable A'o, the first tag 
block F always has the tag ’’free”. After we condition on F, we recursively construct an expression tree for 
each extended component. For each one of those components, we condition on the first block and recursively 
construct an expression tree for each extended component... Hence, apart from the time when the first block 
was F, we can always assume that the hypergraph TL consists of only one extended component. 

Now, suppose % is a single extended component; how did we get to TL in the first place? In the previous 
stage, we had a hypergraph TL' with L' being the first tag block (not necessarily a product block) and W' 
being the product vertices in TL' — L'. We obtained TL by taking a single connected component of TL' — L' — W' 
and extending it back with vertices from W'. Hence, TL — W' is connected. Let L be the first tag block 
of TL and suppose it consists of product vertices. Let W be the product vertices in TL — L. Note that that 
L U W C W'. We obtained the extended components of TL — L by taking the connected components of 
H-(LU W) and then adding back vertices from W. However, TL — (L U W) is connected because TL — W' is 
connected. Hence, when we add back vertices from W, we get TL — L as a single extended component; this 
proves the claim. □ 

Now we are ready to show that the expression tree defines a partial order on the variables. Let P be 
the expression tree of p as defined in Definition 6.18. Define a binary relation PpQ V x V on the variables 
V as follows. For any pair u,v £ V, we write u Pp v if u = v or u belongs to a strict ancestor of v in the 
expression poset P. (Note that the same variable u might occur several times in the expression tree, hence 
it is not immediately obvious that Pp is indeed a partial order.) 

Corollary 6.21. The binary relation Pp defines a partially ordered set. 

Proof. Reflexivity and transitivity hold trivially. We check the antisymmetry property of Pp. Suppose 
u Pp v, v Pp u, but m/d. It cannot be the case that both u and v are semiring variables, because each 
semiring variable occurs only once in P. If u is a semiring variable and v is not, then v is both a strict 
ancestor and a strict descendant of u. This means v is a strict ancestor of a copy of itself, violating part 
(a) of Lemma 6.20. So we are left with the case when both u and v are product variables. But in this case, 
one of them (say, u ) comes before the other (say, v ) in the original expression p, and thus by part (b) of 
Lemma 6.20 v cannot be an ancestor of u. □ 
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The above corollary justifies the correctness of the precedence poset definition. 

Definition 6.22 (Precedence poset). The precedence poset is the partially ordered set P = (V, ^p). (We 
abuse notation and use P to denote the poset also.) Also, as in Definition 6.3 we let LinEx(P) denote the 
set of linear extensions of P. 

We next prove the soundness of LinEx(P). 

Theorem 6.23 (LinEx(P) C EVO(<^, P(D/))). Every linear extension of the precedence poset is ip-equivalent. 


Proof. Similar to the proof of Theorem 6 . 8 , we only need to prove soundness for the compartmentalization 
poset, i.e. the poset defined the expression tree constructed using only the compartmentalization step. A 
slightly tricky issue in this case compared to that of Theorem 6.8 is the multiple occurrences of some product 
variables. Let a be the variable ordering used to write (p. We prove this claim by induction on the number 
of tag blocks of the input sequence. 

The base of one single tag block holds trivially. In the inductive step, suppose er has at least two tag 
blocks with L being the first tag block. For each extended component (V'(C), £’(C)) of H — L, we form 
a sub-sequence ac, which defines an FAQ-expression (pc■ This expression ipc is on the conditional factors 
ips(- I x l) for each S G £ such that S PI V(C) ^ 0. (Recall that V(C) is different from V{C) because V'(C) 
also contains some product variables.) We also construct the expression pc for the dangling component 
D. This expression is on the conditional factors I x l) f° r each S € £ such that S \ L C W. When 
we condition on the first L variables, the expression p(- | xp) factorizes completely into a product of the 
FAQ-expressions pc and Pd (if D is not empty). Note that the same product aggregate ®M = 0 might 
occur in several pc and pn\ this corresponds to different copies of W in our construction. 

Now, let 7 T be an arbitrary linear extension of the compartmentalization poset for p. For each extended 
component (V'(C),£(C)), let nc denote the subsequence of 7 r obtained by picking out variables in V'(C). 
Then, ttc is a linear extension of the (compartmentalization) poset for pc, and thus ire G E\/0(pc,F(I)i)) 
by induction. Note that the first \L\ variables in n must be those in L. Hence, the FAQ-expression p n defined 
by 7 r is identical to p because p w (- | xp) factorizes into exactly the pc and pn- EH 

We now follow the script of Section 6.1 to prove the completeness of LinEx(P). 

Lemma 6.24. For every variable ordering ir = (iq,..., u ra ) G EVO(y>, J r (D/)), the variable Uf+\ must 
belong to a child node of the root of the expression tree. 

Proof Sketch. The proof is similar to that of Lemma 6.9. The main difference is that instead of con¬ 
sidering a connected component, we will be considering an extended component. Hence in the path 
(i o := Uf+i, q,..., ik G L), we can assume that non of the intermediate vertices ,..., *fe—i is a prod¬ 
uct variable. In addition, instead of using the <j)% k _ x i k defined in Proposition 6.7, we will be using the one 
defined in Proposition 6.17 (whose range is Dp). Also for all S G £, instead of ^s( x s) being 0 whenever 
there is a variable (! G S\{io ,..., ik} for which xp ^ ep, the value of ifsi^s) could now be 0 or 1 depending 
on whether ^ is a semiring or product aggregate. In particular, if there is V G S \ {*o> • • •, ik} for which 
xp ^ ep and 0 (( ^ is a semiring aggregate, then ips{*-s) = 0- Otherwise, if there is P G S \ {*o, ■ ■ ■, ik} for 
which xp ^ ep and ® (< f is a product aggregate, then ')As(x.g) = 1 . Otherwise, Qs( x .g) is defined as in the 
proof of Lemma 6.9. □ 

The definition of componentwise-equivalence remains the same as that of Section 6.1 except that instead 
of taking connected components, we will be taking extended components (See Definition 6.18). 

Definition 6.25 (Componentwise-equivalence). Let p be an FAQ-query, a = (tq,... ,v n ) G EVO(<£>, .F(D/)) 
be a variable ordering. Let 7 r = (iq,..., u n ) be another variable ordering with {rq,..., Uf} = F. Then, 7 r is 
said to be componentwise-equivalent (or shortly CW-equivalent) to er if and only if: 

• either n = 1 , 
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• or Li has > 2 extended components, and for each extended component C = (V'(C'), £'(C)) of Li, nc is 
CW-equivalent to ac, where ac (respectively, 7r c) is the variable ordering of V{C) that is consistent 
with a (respectively, 7r), 

• or {ui} = {iq} =: L is a semiring or free variable or (for some p > 1) {ui,..., u p } = {iq,..., v p } =: L 
are product variables, and for each extended component C = (V'(C), £'{C)) of Li — L, nc is CW- 
equivalent to ac, where ac (respectively, ttc ) is the ordering of V'{C) that is consistent with a (re¬ 
spectively, 7T ). 

Given a set of variable orderings A C EVO(<p, .F(D/)), we use CWE(A) to denote the set of all variable 
orderings that are CW-equivalent to some variable ordering in A. 

Notice that variables in the dangling product set can be placed anywhere later in the ordering, and hence 
they are ignored in the above definition. 

Proposition 6.26. Let n be a variable ordering that is CW -equivalent to a £ EVO(<^, we have 

7r £ EVO(y>,^(D/)) and faqw(cr) = faqw(7r). 

Proof. Same as Proposition 6.11. The only difference is that for any product variable k' € [n] — K , the 
set U%, is not considered in the definition of faqw(cr) (Definition 5.10). Hence, an initial block of product 
variables (such as {rq, ... ,v p } in Definition 6.25) can be ordered arbitrarily. □ 

Theorem 6.27 (EVO(<y9, J-"(D/)) = CWE(LinEx(P))). Let ip be an FAQ -expression of the form (21). A 
variable ordering a is ip-equivalent if and only if it is CW -equivalent to some ordering n which is a linear 
extension of the precedence poset P. 

Proof. Similar to the proof of Theorem 6.12. The containment CWE(LinEx(P)) C EVO(y>, P(D/)) follows 
from Proposition 6.26 and Theorem 6.23. Hence, we only need to prove the other direction. 

We prove the containment EVO(y>,P(D/)) C CWE(LinEx(P)) by induction with a slightly stronger in¬ 
duction hypothesis. We show that, if a is <p-equi valent, then there is a linear extension n £ LinEx(P) such 
that all product variables in 7r have exactly the same relative order as that in a. 

Without loss of generality we can assume that the root of the expression tree P is empty. Fix an arbitrary 
cr = (iq,..., v n ) £ EVO(</j). By Lemma 6.24, iq £ L. We recognize two case: 

• If the tag of L is not <g>, then for every extended component C of H — {ui}, define a sub-query ipc with 
variable ordering ac on the conditional factors 

{M- I srj 5e£ASnv(C)/0}, 

where ac is the subsequence of cr obtained by picking out vertices in V'(C’). Let Pc be the precedence 
poset of the expression tree for ipc- By induction on the number of variables, we know that 

a G £ EVO(^ c ,P(D 7 )) C CWE(LinEx(P c )). 

Hence, there is a variable ordering nc £ LinEx(Pc) that is CW-equivalent to ac- Moreover, product 
variables (if any) in nc maintain their relative order in ac- Although the same product variable(s) 
might appear in different components, we can pick a variable ordering n that is consistent with all nc 
such that n starts with iq, followed by variables in L — {ui}. It follows that n £ LinEx(P) and n is 
CW-equivalent to cr. Moreover, product variables in 7r maintain their relative order in a. 

• If the tag of L is <8>, then Li — {«i} cannot have more than one extended component (using the same 

argument as in the proof of Lemma 6.20). Applying Lemma 6.24 and the “one-extended-component” 
argument repeatedly, we infer that {vi,... = L and that Li — L has maximally one extended 

component. 

□ 


Corollary 6.28. We have 


faqw(y>) = min{faqw(cr) | a £ LinEx(P)} . 
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6.3 General FAQ query 

The only case of FAQ that does not fall under the previous section is the case where we have non-idempotent 
product aggregates. This general case is not very natural as it is really hard to find practical examples that 
can only be represented using this form. For completeness, we describe how to handle it in this section. 

Definition 5.2 defines idempotence from an algorithmic point of view: A product aggregate is called 
idempotent at the time it is about to be eliminated in InsideOut if the ranges of all factors containing it at 
that time are idempotent wrt to (£>• This definition not only depends on the specific elimination ordering 
being used, but also on the input factors (i.e. data-dependent). While this definition makes sense from an 
algorithmic point of view, it does not capture idempotence at semantic level. 

From a semantic point of view, we can only reason about whether the product is idempotent under the 
entire domain D (or at least a closed subset of the domain D/, as we did in Section 6.2). More specifically 
in this section, instead of thinking of idempotence as a property of each product aggregate, we will think of 
it as a property of the product operator < 8 > under the domain D. A product operator ® is idempotent under 
D if and only if a<S>a = a for all a G D. Here, we are interested in the case where is not idempotent under 
D. 

We will explain the basic ideas using a simple example. 

Example 6.29 (FAQ with non-idempotent <g> under D). Consider the following FAQ. 

* = EIIE ^13(2:1 ,X3)l/} 2 (X2). 

Xi X 2 x 3 

Notice that <p consists of two connected components (or two extended components as defined in Section 6.2.3). 
However, assuming the product is non-idempotent under D, tp is written as 

n^ 2 ) • 

. X 2 

Although X 2 is connected to neither X\ nor £ 3 , it imposes an order on them. However if we construct the 
expression tree directly as described in Section 6.2.3, it is going to be equivalent to the following expression. 






p = 


/ \ |Dom(X 2 )f 

EE ^ 13 (^ 1 , £ 3 ) ) 

Xl \ x 3 ) 


ip = 


E ^ 13 ( 2 : 1 , 2 : 3 ) 

JEl,2C3 


_ x 2 


The above expression fails to capture the fact that X\ must come before X3. (Notice that this situation 
does not necessarily happen between sibling nodes in the expression tree. It could have happened between 
arbitrary nodes that are incomparable under the precedence poset of the expression tree.) 

To solve this issue, we extend all factors by adding all product variables to them, and then we construct 
the expression tree for the extended ip exactly as described in Section 6.2.3. In this example, we extend Q 13 
into Q 123 which is defined for all ( 21 , 2 : 2 , 2 : 3 ) € Dom(Xi) x Dom(X 2 ) x Dom(X 3 ) as follows. 

^123(2:1,2:2,2:3) = ^13(2:1,2:3). 

The extended ip (let’s call it <p') will be 


^ = EIIE 1^123(^1,2:2, 2: 3 )V'2(2;2). 

xi 12 x 3 

The expression tree of ip' (under Definition 6.18) is going to be equivalent to the following expression. 


p = 


EIIE 

_ Xl X 2 x 3 


n^) 

- X 2 
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Now, this new expression forces x\ to come before X 3 in the order. Notice that the above extension takes 
place only at semantic level (i.e. during the construction of the expression tree). At algorithmic level, we 
will still be working over the original p. For example after eliminating 13 above in InsideOut, X 2 is going to 
be eliminated from ^ £ 3 ) by raising the original ^ 2 X ipi 3 ( x i,X 3 ) to the power of |Dom(X 2 )| 

(using repeated squaring). 

Definition 6.30 (Expression tree and precedence poset). Given an FAQ query p (as defined in Section 1.2) 
with hypergraph % and domain D where < 8 > is not idempotent under D, let AT be the set of semiring aggregates 
and free variables (as define by (13)) and K := [n] — K be the set of product aggregates. We define a new 
FAQ query ip' with domain D and hypergraph T~L' = ([n],£') where 

£' := {S U K | S G £} , 
and p' : Tlier/] DompQ) —>• D is defined as 


<p'(*[/]) := ©i^ • • ■ ©£? ©W Vs'M, 

where for each S U K € £', : FFesuK Dom(X, ; ) —> D is defined as 

^'suk^Slik) ■= ^ s ( xs ). 

The expression tree of ip is constructed by applying Definition 6.18 on ip'. The precedence poset is constructed 
by applying Definition 6.3 on the expression tree of tp. 

Notice that because every factor in p' contains all product variables K , it doesn’t really matter anymore 
whether product is idempotent or not. In particular, the situation of having to raise to a power of |Dom(Xi)| 
won’t be happening anymore in tp' (at semantic level). Hence, we can run the exact same analysis from 
Section 6.2 on tp' in order to prove that: 

Theorem 6.31 (LinEx(P) C EVO(y>)). Every linear extension of the precedence poset P is p-equivalent. 

Notice again that we are merely using tp' to semantically determine which orderings a are in EVO(yj). 
The algorithm (e.g. InsideOut) will still be running on tp. 

While the above definition of the expression tree captures completeness at an intuitive level (as we argued 
for in Example 6.29), achieving completeness in a rigorous way requires long definitions and unnatural 
assumptions. Because this class of FAQ is not well-motivated by any practical examples, we skip the rigorous 
completeness definitions/proofs for this section. 

7 Approximating the FAQ-width 

Recently, it has been shown that computing the fractional hypertree width (fhtw) is NP-hard [37]. By 
extension, the problem of computing a tree decomposition with the (optimal) fractional hypertree width 
is also NP-hard (since having that tree decomposition enables computing fhtw in polynomial time). As 
was shown in Proposition 5.12, our FAQ-width (faqw) is an exact generalization of the fractional hypertree 
width: it coincides with fhtw for SumProd queries and for queries where all variables are free. By extension, 
computing faqw and finding a tree decomposition with the optimal faqw are both NP-hard. 

Marx [60] had suggested a polynomial-time approximation algorithm that, for any constant w, given a 
hypergraph whose fhtw is at most w, outputs a tree decomposition whose fhtw is 0(w 3 ). In this section, 
we aim to design a polynomial-time approximation algorithm for the FAQ-width. Our algorithm is going to 
use any approximation algorithm for fhtw (such as Marx’s) as a blackbox, and our approximation guarantee 
is going to be in terms of the approximation guarantee of the blackbox algorithm. Our approximation 
algorithm is also going to rely on the expression tree constructed in Section 6. 
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7.1 FAQ with only semiring aggregates 

Recall the definition of the FAQ- width (Definition 5.10): 

faqw(cr) := max {p^(f7fc)} ■ 
k£K 

When there is no product aggregate, K (as defined in (13)) is exactly [n]. Let P be the expression tree 
constructed from the query ip in the form (17). We define notation that will be used throughout this section. 
Let C be a node of the expression tree P. (This means C is a set of variables of p, and all variables in C 
have the same tag.) Let L be the parent node of C (if any) in the expression tree P. We define the following 
sets: 


£{C) := {Se£ | SnCQ0 

for some C' node in the subtree of P rooted at C.} 


5 'l,c 

U{C ) := 


Ln | (J s 

yS&£{C) 


u 


L' an ancestor of C 


L ’n U s). 

se£{c ) 


(24) 

(25) 

(26) 


If C has no parent then U{C) = 0 by default. We think of the set Sl,c as the contribution of all the nodes in 
the C- branch to L , and the set U(C) as the contribution of all the nodes in the C-branch to all the (strict) 
ancestors of C. Next, for every node L in the expression tree P, define the hypergraph Hl as follows. 

• If L is a leaf node of P, then Hl = H[L], the subgraph of H induced by L. 

• If L is not a leaf node of P, then Kl = (A, £l), where 

£ l := {SnL | {S e£) A {SnL^Q) A (SnC7 = 0,V descendant C of L)} 
u{S'l,c I C a child of L}. 


In other words, £l is the set of all projections of S down to L for all hyperedges S for which S intersects 
L but not any descendant of L in the expression tree; and for each child C of P, £l also contains the 
projection onto L of the union of all hyperedges S that intersect (some descendant of) C. 

We next prove a simple lowerbound for faqw(c/j) that leads to an approximation algorithm for computing 
faqw(<p) using an approximation algorithm for fhtw as a blackbox. 

Lemma 7.1. For any node L in the expression tree, 

faqw(tp) > fhtw(Pi) 
faqw(yi) > p* u (U(L)). 


Proof. To show the first inequality, by Corollary 6.14, it is sufficient to prove that faqw(cr) > fhtw (Hl) for 
any variable ordering a = (vi ,..., v n ) G LinEx(P) and for any node L in the expression tree. 

If L is a leaf node of the expression tree, then faqw(cr) > fhtw(P) > fhtw (Hl) because Hl is an induced 
subgraph of H. Now, suppose L is not a leaf node. For any child C of L , let k be the smallest integer 
such that Vk belongs to some node in the subtree rooted at C. Then, due to the fact that <r G LinEx(P), if 
Vj G L then j < k. The set is precisely equal to U(C) U {ufc}. This is because each time we eliminate a 
vertex belonging to any node in the subtree rooted at C, we insert back a hyperedge interconnecting all its 
neighbors (to the next hypergraph in the hypergraph sequence). And so by the time we reach all of the 
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nodes in U(C) belong to the same hyperedge of TLf.. It follows that (U% — {yx}) n L = U(C ) fll = Sl,c 
(since L n V = 0 for any ancestor L' of L). From this observation we obtain: 


faqw(</?) 

= 

min (max {U%) \ 

<7 

k £ [n]}) 


> 

min (max {p%j(Uf) 

(T ' L J 

1 J e L}) 

(p* is monotone) 

> 

min (max {p^(t/j T D L) \ j £ L}) 


> 

min (max {px(tTJ) 

lieL}) 


= 

fhtw("Hi). 



In the above, r is taken only over all variable orderings of L instead of the entire set [n]. 

We next prove the second inequality that, for any cr = (v\,... ,v n ) £ LinEx(^), and any node C in the 
expression tree, we have faqw(cr) > Py^{U(C)). As we have observed above, let k be the smallest integer such 
that Vk £ C, then Ujf = U{C) U {vk}- Hence, because p* is monotone, 

faqw(cr) = maxp^(Uj) > p* H (JJ£) > p* H (U(C)). 

ril J 


Theorem 7.2. Let ip be any FAQ query whose hypergraph is Tl and all variable aggregates are semiring 
aggregates. Suppose there is an approximation algorithm that, given any hypergraph TL', outputs a tree 
decomposition of TL' with fractional hypertree width at most g(fhtw('H / )) in time t{\TL'\, fhtw('H / )) for some 
non-decreasing functions g,t. Then, we can in time \TL\ ■ t(\TL\, faqw(y>)) compute a ip-equivalent vertex 
ordering a such that 

faqw(cr) < faqw(y>) + g(faqw(y>)). 

Proof. We use the blackbox approximation algorithm for fhtw to construct a tree decomposition (Tl, \l) for 
every hypergraph TLl where L is a node in the expression tree. Then, from each of those tree decompositions, 
we construct a variable ordering ox for variables in the set L in the standard way. Finally, we construct 
the variable ordering cr for [n] by concatenating all the ox together in any way that respects the precedence 
partial order. 

Suppose cr = [v\,... ,v n ) is the resulting variable ordering. Consider an arbitrary vertex Vk- Let L be 
the node of the precedence tree that contains Vk- Let B be the bag in ( Tl,xl ) that Vk belonged to when it 
was eliminated to construct ax. Then, using the same argument as in the proof of Lemma 7.1, we can show 
that 

UkCBUU(L). (27) 

To see this, first consider the simpler case when L is a leaf node of the expression tree. Then, when we 
eliminate Vk the set C4 is the union of the sets S £ d{vk). The part UkC\L is covered by B because within L 
the elimination algorithm works on LLl ■ The part Uk\L is covered by the maximum residue left over from 
eliminating all vertices in L. The residue is precisely the set U(L), because every time we eliminate a vertex 
we collect all its neighbors together into a hyperedge. By the time the last vertex from L is eliminated, the 
entire set U ( L ) becomes a hyperedge. Now, if L is not a leaf node, the situation is exactly the same except 
for the fact that we work on the graph Hl which is not necessarily the same as H[L\. The hypergraph TLl 
contains the restrictions on L of all the residues of the subtrees under L. 

Next, from Lemma 7.1 and from the fact that p* is subadditive, relation (27) implies 

P*n(Uk) < P*(B) + P*{U(L)) 

< <7(fhtw(%L)) + faqw(</?) 

< <7(faqw(<p)) + faqw(yj). 


Finally, 


faqw(cr) = max p^(Uk) < ^fa^MT 5 )) + faqw(yj). 

fce[n] 


□ 
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7.2 FAQ with an inner FAQ-formula closed under idempotent elements 

The key difference between this section and Section 7.1 is that K (define by (13)) is not necessarily equal to 
[n]: instead it is now the union of F = [/] and the set of semiring variables. We will still follow the same 
strategy as that of Section 7.1, though the definition of the sets <Sx,c for each node L and a child C of L is 
a bit more delicate. 

Definition 7.3 (Semiring node and product node). A node in the expression tree P is called a semiring 
node if its variables have a tag forming a semiring with <g>. Otherwise, the node is called a product node. 

Let C be a node of the expression tree P. Let L be the parent node of C (if any) in the expression tree 

P. We define £(C) differently from the previous section as follows: 

£(C) := {S' £ £ | S fl C' ^ 0 for some semiring node C' 

in the subtree of P rooted at C.} (28) 

(Compare the above definition with (24).) We define Sl,c and U(C) using (25) and (26) (where the £{C) 
referred to in (25) and (26) is now the one defined by (28)). Note in particular that if a node does not 
have any semiring descendant then U (C) is empty. We think of Sl,c as the residue imposed on S from the 
process of eliminating all semiring variables under the C-subtree. 

We next define the hypergraphs Pl , similar to what was defined in Section 7.1, with one small difference: 

• If L is a leaf node of P, then Pl = P[L], the subgraph of P induced by L. 

• If L is not a leaf node of P, then Kl = (L, £l ), where 

£l := {S D L | (S £ £) A (S fl L / 0) A (S fl C = 0, V semiring descendant C of L)} 

U {S L ,c | C a child ofP}. 

Here is where £l is different from the corresponding definition of £l in Section 7.1. We only take the 
projection of the hyperedges S for which S does not intersect any semiring descendant of L. The key 
point to notice is that, if S intersects some semiring descendant, then its contribution to the node L is 
summarized already in some Sl,c- 

We now can follow the script of Section 7.1. The proof of the following lowerbounds are identical to that 
of Lemma 7.1, and thus is omitted. Note that the bound only holds for the semiring nodes. 

Lemma 7.4. For any semiring node L in the expression tree, we have 

faqw(<p) > fhtw (Pl) 
faqw((/?) > p* u {U{L)). 

We can now design the approximation algorithm. 

Theorem 7.5. Let tp be any FAQ query of the form (21) whose hypergraph is P. Suppose there is an 
approximation algorithm that, given any hypergraph P ', outputs a tree decomposition of P' with fractional 
hypertree width at most g(fhtw("H / )) in time t(\P'\,fhtvj(P')) for some non-decreasing functions g,t. Then, 
we can in time \P\ ■ t{\P\, faqw(yj)) compute a ip-equivalent vertex ordering a such that 

faqw(u) < faqw(y)) + g(faqw(y>)). 

Proof. First, we construct a “super” tree P from the expression tree P by replacing every semiring node L 
of P by a single “super-variable” L. After this is done, P contains super variables and product variables. 
Since P defines a poset on the descendant relation, P also defines a poset on the descendant relation. We 
call P the “super poset”. 
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Now, we take any linear extension a of the super poset P. Then, for every super variable L (which is 
a semiring node), we use the blackbox approximation algorithm to construct a tree decomposition (Tl, \l) 
for every hypergraph TLl- Then, from each of those tree decompositions, we construct a variable ordering 
(Tl for variables in the set L. Finally, we replace the occurrence of L in a by the variable ordering oq,. Call 
the final variable ordering er, then it is a linear extension of the precedence poset P. 

Suppose a = (vi,, v n ) is the resulting variable ordering. Consider an arbitrary vertex Vk in a semiring 
node L. Following the argument in the proof of Theorem 7.2, we can see that Ujf C B U U(L). From 
Lemma 7.4 and from the fact that p* is subadditive, it follows that 

Pn( U k) < P*( B ) + P*( u i L )) < 5(fhtw(H L )) +faqw(v?) < g(fhtw(y>)) +faqw(y>). 

The rest of the proof is identical to that of Theorem 7.2. □ 

Note that requiring different copies of the same variable to be consecutive is not necessary for InsideOut 
to work. The algorithm works even if we consider them to be different variables on the same domain. 
However, the collapsing of different copies back to the original copy is needed for the rigor of the definition 
of EVO(^, J’(Dj)). 

Note also that Theorem 7.5 implies Theorem F.ll and Theorem 7.2. Applying the above theorem using 
the approximation algorithm from Marx [60], we obtain the following 

Corollary 7.6. Let tp be any FAQ query of the form defined in (21). Suppose faqw(y>) < c for some constant 
c. 14 Then, the query can be answered in time 

6 (jv°( faqw3 (^)) + \\<p\\) . 

(Recall that O hides a factor that is polynomial in query complexity and logarithmic in data complexity.) 

7.2.1 Corollaries in Logic 

Using the reduction in Example A.20 from QCQ to FAQ, we obtain 

Corollary 7.7. QCQ is tractable for the class of quantified conjunctive queries ip where faqw is bounded. 

To determine which classes of QCQ formulas are tractable, Chen and Dalmau [24] defined the notion of 
a prefixed graph and its width. The definition is in the beginning of Section III of their paper. The prefix 
graph’s width corresponds exactly to to the quantity max ff maxfc g ^{|£4|}. Since Py.(Uk) < \Uk\, faqw(tp) is 
a stronger notion and our result implies the main positive result in their paper (the first part of Theorem 
3.1). Furthermore, we can construct families of QCQ instances for which faqw(tp) is bounded but the prefix 
graph’s width is unbounded. For example, consider the following quantified conjunctive query 


* = VA-i • • • WX n 3X n+1 S(X U ..., X n ) A /\ R(X, ,X n+ i) 

\ *e[n] 

Chen-Dalmau’s prefix graph’s width is n + l in this case, but faqw is 2. Consequently, our result is strictly 
stronger. The next corollary resolves an open question posed at the end of Durant and Mengel’s paper [34], 

Corollary 7.8. ffQCQ is tractable for the class of quantified conjunctive queries ip whose faqw is bounded. 

14 This requirement is inherited from Marx approximation algorithm [60]. 
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8 Input and Output Representation 

Thus far we stated our results for the case when the input and output factors are in the listing representation 
(Definition 4.1). Tracing back at the key steps of InsideOut (and Outsideln) algorithm, it is easy to see that the 
algorithm works for a more general setting than just the listing representation. Section 8.1 presents the two 
conditions that we need from the input factor representation for all our previous arguments to go through. 
In Sections 8.2 and 8.3, we consider more compressed forms of input factor representations than the listing 
representation. In particular, Section 8.2 considers other representations that can be reduced to the listing 
representation (at the cost of having to compute a modified FAQ), which means we can still use InsideOut as 
is. In Section 8.3 we consider even more succinct representations where the reductions in Section 8.2 are too 
costly. In particular, for the special case of SAT and #SAT, we show that we can replace Outsideln with a 
more specific algorithm, which allows us to recover known results on polynomial-time solvability of certain 
classes of SAT and #SAT formulas. While these results only re-prove existing results, we consider these as 
an illustration of the power of the FAQ (and variable elimination) framework. We then switch to considering 
situations where one might be interested in output representations other than the listing representation in 
Section 8.4. Finally in Section 8.5, we consider the case when the output and input representations match, 
which naturally leads to the notion of composing FAQs. FAQ composition is also needed in Section 8.2. For 
the sake of clarity, we will almost exclusively focus on FAQ-SS in this section. 

8.1 The factor oracle 

In many application areas of FAQ, there are many ways the input factors can be represented. Appendix G 
lists some of the main representations collected from areas such as logic, PGM, and matrix computation. 
It should be obvious that the computational complexity of FAQ is highly dependent on how the inputs are 
specified. To generalize our results, we advocate for the following oracle model, which is sufficiently general 
to capture existing models of input and output representations. We note that our assumptions here are met 
by all of the representations discussed here and in Appendix G - the only difference is in the price the oracle 
pays to answer the query. 

Assumption 1 (Conditional query assumption). We assume that Dom(Ay) are totally ordered. 15 Each 
factor oracle for ips is capable of answering the following query called the conditional query. Let 0 < k < n— 1 
be any integer such that k + 1 £ S. Let xm = (aq, ... ,x k ) be a vector such that ipg(- | X[ fe j) 0. Let 
y £ Dom(Afc-|_i) be arbitrary. Then, the factor oracle for ips can return a minimum value x k +i > y for which 
4>s{- I X[ fc+1 j) jk 0, i.e. 

Vk +1 = minjaq+i | (x k +i > y) A (ip s (- | x fc+i ) ^ 0)} . 

If no such xf +1 exists, then +oo (just a symbol) is returned. 

A value query returns the value ips fas) f° r some given xg. For simplicity we will also call those value 
queries conditional queries. 

The second assumption about the input oracles is the following. 

Assumption 2 (Product-marginalization assumption). Let ips be an input oracle and i £ S. The input 
oracle for ips can return another oracle on S — {*}, denoted by ips-u\, defined by 

ip S -{i}fas-{i}) = sfas )• 

a?i£Dom(Xi) 

The number of (g) operations performed to compute the factor ips-u\ is bounded by Ills'll- 

This assumption is reasonable because as soon as ips fas) = 0 we can infer that ips-uxfas-U}) = 0- 
Note also that if we use the listing representation for ips~{i}, then we can compute entries of ips-{i} using 
at most l^sll conditional queries. 

15 We order the domain arbitrarily if there is no natural total order; e.g. false < true for a Boolean problem, or blue < green 
< red in a coloring problem. 
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Appendix G.l explains why the listing representation satisfies both of the above assumptions. By re¬ 
examining the previous runtime analysis of InsideOut, one can check that in the more general factor oracle 
model, one can prove the following generalization of Theorem 5.5: 

Theorem 8.1. Suppose the Conditional query assumption and Product-marginalization assumption are sat¬ 
isfied. Using notations established in Theorem 5.5, the InsideOut algorithm applied to ip (Algorithm 1) 
satisfies the following properties. 

(i) The number of conditional queries to the factor oracles is at most 

Y \U k \-\{Se£:SnU k ^®}\-AGM{U k ) + £ life! 

fceif kiK 

SGd(k) 

+ Y llV’sll ' Idem^g, k) + /(/ + ro)||^||. (29) 

k£K 

s&e k -d(k) 

(ii) the total number of operations performed is at most YlkeK Wk\ • AGM(/ 7 fc). 

(in) the number of operations performed is at most 

Y(\{S&£-SnU k ^H}}\-l)-AGM(U k )+ Y Ills'll 

k£K k^K 

Sed(k) 

+ Y II^H ■ ^em(ip s ,k) +/(/ + m)|M|. (30) 

k4K 

se£ k -d{k) 


8.2 Input representations parsimoniously reducible to FAQ 

We briefly mention other representation choices for input factors. More details (including formal definitions) 
can be found in Appendix G. 

We begin with a representation that is more wasteful than the listing representation: the “truth table” 
representation, which is the norm is dense matrix computation as well as inference in probabilistic graphical 
models. The difference from listing representation is that tuples that result in a 0 are also explicitly listed. 
It is trivial to convert this representation into the listing representations. 

Beyond the truth tables, other known representations are more succinct than the listing representation. 
In the CSP literature, these include generalized disjunctive normal form (GDNF) that was first considered by 
Chen and Grohe [25] and decision diagram representation of Chen and Grohe [25], which is a generalization 
of the well-studied ordered binary decision diagrams or OBDDs. In the database literature, Olteanu and 
Zavodny [73] considered factorized representations for conjunctive queries. Fast matrix vector multiplication 
has many examples of more succinct representations than the listing representation (e.g. the DFT). In the 
PGM world, Algebraic Decision Diagrams (or ADDs) were introduced in Bahar et al. [12], 

A common property of all of the above succinct representations is that they can all be reduced to the 
listing representation at the expense of having to solve a more complicated FAQ problem. In particular, this 
leads to the problem of composing FAQ problems, which is discussed in Section 8.5. 

8.3 Input representations not parsimoniously reducible to FAQ 

Thus far we have analyzed the runtime of the InsideOut algorithm in terms of the sizes of the input factors, 
which are the number of non-zero points of each input factor. If the input factors were represented using the 
listing representation, then 0(|S'! -1| V's’||) is precisely the input size for the factor ips (assuming each functional 
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value is of size 0(1)). However, in some applications the input factors are much more compactly represented, 
and the representation of ips may be exponentially smaller than ||'i/ , s||- The canonical example is the SAT 
problem, where a CNF clause ips has size ||r/>s|| = 2l s l — 1, yet represented using only IS) bits. Note that in 
this case we could reduce the input to the listing representation but this would suffer an exponential blowup 
in size, which is too expensive. In the rest of the section, we outline how we can modify our framework to 
handle this succinct representation of SAT and #SAT problems. 

The conditional-search Outsideln algorithm is no longer a good choice for such compact input represen¬ 
tation, because its runtime depends heavily on the number of non-zero elements in each input factor. Also, 
we cannot expect the same sort of results from Section 5 to hold for SAT or ^SAT, because even if the 
input query was a-acyclic the problem is still NP-hard or #P-hard, respectively. (Recall that a-acyclic 
hypergraphs have fhtw = 1.) Recently, however, there were a couple of interesting results showing that SAT 
and # SAT are tractable for /3-acyclic queries [18,74]. 

In the relational join problem, there is also a phenomenon discovered recently where the computational 
difficulty we face comes from precisely the same problem of compact input factor representations. In [65], 
modulo a technical assumption, we showed that the minimum number of comparisons a comparison-based 
join algorithm performs is in the same order as the minimum number of rectangular boxes of a certain format 
needed to cover the entire output space. We designed an algorithm called Minesweeper whose core function¬ 
ality is to determine whether or not a set of rectangular boxes covers the entire output space. In [2], the 
Tetris algorithm relaxed the assumption about the input boxes, and formally defined this box cover problem 
(or BCP). In both cases, BCP’s difficulty lies precisely in the fact that the boxes represent points not in the 
output, and thus a conditional-search-style algorithm such as Outsideln does not work well. In Minesweeper 
and Tetris, the collection of supports of the rectangular boxes forms a hypergraph. Similar to the SAT 
and #SAT case, if this hypergraph is only a-acyclic, then the box cover problem is NP-hard. But, if it is 
/3-acyclic then there is a linear time algorithm solving the box cover problem. 

All of the four algorithms can be explained using the InsideOut/Outsideln duality framework. The 
InsideOut algorithm remains the same - it is just variable elimination - but we have to tailor the Outsideln 
algorithm to suit the compact encoding. In particular, BCP, SAT, and #SAT can all be reduced to FAQ 
instances in which each of the input factors ips is of a special form called the box factor. 


Definition 8.2 (Box factor). A box with support S = {*i,...,i g } is a tuple B = (iq,..., I ie ) where 
/,. C Dom(X^) is an interval in the domain Donr^AQ,). The box is a set of points x £ IHLi Dom(Xj) for 
which Xi j £ Ii j for all j £ [s]. A factor ips : TL es Dom(X^) —> D is called a box factor if there is a box B 
and a c £ D for which 


■ 0 s(x) 


c if x £ B 
1 ifx^B. 


In SAT and jfSAT, each CNF clause is a box factor. For example, in the clause ( X\ V X% V X 3 ), the box 
is ({false}, {true}, {false}) (with c = false). For SAT the domain D is {true, false}. For #SAT the domain is 
R + . Each rectangular box in BCP is clearly a box factor with Boolean domain, and looking for a point not 
covered by all the boxes is the same as solving the corresponding FAQ instance. 

The first key to the algorithm in [74] for SAT, the algorithm in [18] for #SAT, Minesweeper in [65], and 
Tetris in [2] is that we can still run variable elimination, but each of the intermediate factors ipjj k _^y is 
computed and represented as a product of box factors. The second key observation is that, for /3-acyclic 
hypergraphs, there is a variable ordering a = (iq,... ,v n ) called the nested elimination order (NEO) such 
that, for every k £ [n], the collection of sets in d(vk) forms a chain (see Proposition 4.10 and its proof 
in Appendix C). This nesting allows us to keep the box factor representation of the intermediate factors 
compact, i.e. each intermediate factor ipjj k _{ Vk y is a product of a “small” number of box factors. 

Since Minesweeper and Tetris take a bit of work to set up properly, in the following section we explain 
the algorithms in [74] and [18] using this idea. 
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8.3.1 InsideOut for SAT 


Consider the SAT problem, where we want to know if there exists one satisfying assignment. What does 
“eliminating a variable” mean in this case? We have a set of clauses that contain the variable X n . In each 
such clause, X n occurs either as a positive literal X n , or a negated variable X n . Let dp{X n ) be the set of 
clauses containing X n , and djy(X n ) be the set of clauses containing X n . Then, d(X n ) = dp{X n ) U djy(X n ). 
We want to construct a new boolean factor ipij n _{x rl } where 


^U n ~{X n } 


A <?]*»=* 


it? ea(.Y„) 


A c i 

ced(x n ) 



C N\x n =true 

K c N ed N (x n ) 


Cp\x n =fa\se 

iC P ed P (x n ) 


A 


(Cn | 


x n =true 


V Cp |:r n = false) 


c N ed N (Xn),Cpedp(Xn ) 


But Cjv|x n =true and Cp | Xn =f a ise are simply the original clauses Cn and Cp with X n eliminated completely. 
Hence, the rule is: for every clause Ci containing X n and every clause Cj containing X n , create a new clause 
Cij = Ci V Cj without X n in it. 11 ’ And we have a new instance of the SAT problem. This procedure is known 
as the Davis-Putnam procedure for SAT-solving. Under this procedure, we do not need to enumerate all 
2 \u n \ truth assignments, so it might be faster depending on the input. On the other hand, we will run into a 
combinatorial explosion problem in the number of clauses. For example, after one step we might have up to 
clauses. Dechter and Rish [33] showed that this algorithm runs in time at most 0(n 3 9 W ), where w is 
the appropriate notion of width for the SAT formula. Of course this is no better than the rough runtime of 
0{mn 2 2 W ) 1 but the advantage is that it creates a theory for the proof system. We will not delve any further 
on this point. 

What is important to note here is that the representation of the intermediate factor ipu„-{x n } is a product 
of box factors. One interesting class of input CNF formulas is the class of /Tacyclic CNF formulas. Under 
this class, if the vertex ordering is a nested elimination order, then we don’t have the clause (or box factor) 
explosion problem. This is due to the fact that the set d(Xk) forms a chain, so a variable set in d{Xjf) is a 
subset of another variable set in d(Xk). When the variables in Ci are a subset of variables in Cj , the new 
clause C^ is either a tautology (which can be removed), or is equal to Cj. (When the resolvent is a subset of 
one of the two resolvers, the resolution is called subsumption resolution.) Hence, every time we eliminate a 
variable X n , we do not increase the number of clauses. So we have a polynomial-time algorithm for solving 
/3-acyclic CNF formulas, a result that was only proved recently [74]. 

Theorem 8.3 (Ordyniak, Paulusma, and Szeider [74]). SAT is polynomial-time solvable for the class of 
/ 3-acyclic CNF formulas. 


8.3.2 InsideOut for #SAT 

For the #SAT problem, each clause corresponds to a function to {0,1}. In particular, #SAT can be viewed 
as an FAQ-SS over the semiring (R + ,+, x) where each clause C (whose variables are vars(C)) corresponds 
to a factor ip vars (c) defined as 


tAars(C) ( x vars(C)) 


1 if x vars(C) satisfies C, 
0 otherwise. 


16 In proof complexity, the clause Cij is called a resolvent of the two clauses Ci and Cj. 
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We will be working with a slightly more general version of #SAT, called #WSAT, where each clause C has 
an associated weight (weight(C) £ R+) such that the corresponding factor VVars(C) becomes 


VVars(C) ( x vars (C)) : = 


weight(C) 


if x vars (c) satisfies C, 
otherwise. 


(Obviously, if each clause has weight 0, then SAT reduces back to #SAT. Further, note that this 
is a re-statement of the box factor formulation from Definition 8.2 specialized to this problem.) We chose 
to work with ^WSAT because eliminating a variable from an n-variable ^WSAT instance results in an 
(n — Invariable #WSAT instance, as we will see shortly. 

Eliminating a variable A„ in ^WSAT means defining a set C' n of weighted clauses over the variables 
U n — {X n } such that for all x 


I] VWs(C')( X vars(C')) =E II VVars (C) ( x vars(C) I X n ). (31) 

C'eC' n x n c&d(X n ) 

'---V-' 

i’Un - {jfn } ( x E/„-{X„ } ) 


A clause C' is called monochromatic with respect to a set of (weighted) clauses C if for every clause 
C £ C, either (C ==> C') or (C V C' = true). The color of C' with respect to C, denoted by colo rc(C'), is 
defined as: 

colorc(C") := weight(C). 

CeC C=>C' 

Given a clause C and a variable X £ vars (C), we will be using [C]-x to denote the clause that results 
from C by dropping either the literal I or X (whichever occurs in C). Let C n be the set of clauses that 
contained X n but with X n now removed from them, i.e. 

Cn ~ {[C]_ Xn | C £ d(X n )} . 

Let 7 r„ = (Ajj, X l2 ,..., X lt ) be some arbitrarily-fixed order of U n — {A„}. We can define C n to be the set 
of clauses that are “minimal” w.r.t 7r„ and monochromatic w.r.t C n . Formally, 


C n := |C" is a clause 

Notice that for each C’ £ C nl both C V X n and C V X n are monochromatic w.r.t d{X n ). Let 

weight(C') = color a(Xn) ((7'V A„) + color 8(Iii )(C'vi„) 

= color d P {x n ){C V X n ) + color a N (x n )(C' V X n ). 

It is not hard to verify that the above definition of C' n indeed satisfies (31). However, the size of C' n could be 
larger than \d{X n )\, and \C[\ for * < n could be even much larger. 

/ 3-acyclic case: When the CNF formula is /3-acyclic, it is convenient to work with a nested elimination 
order. Let X n be the last variable in such an order. Thanks to Proposition 4.10, we can choose 7r n = 
(Xi 1 , Xj 2 X i{ ) such that for each clause C £ d(X n ), 

vars (C) = {X n } U [X n , X , h ,..., X ik } for some k £ [(]. 

While the above looks very convenient, if we define C' n as in (32), we might end up with a clause C £ C' n 
for which vars(C") ^ vars(C') — {X n } for every C £ d(X n ). After eliminating X n , the remaining hypergraph 
might not be ^-acyclic anymore because of the new hyperedge vars(C"). 

To remedy the situation, let (Ci,..., C'|g(_Y„)|) be the clauses of d(X n ) sorted in ascending order by 
|vars(Ci)|. In particular, for each i < j £ {1,..., |9(A„)|}, we have 

|vars((7j)| < |vars(C,-)| vars(Ci) C vars(CTj). 


vars(C") = {A^, Aj 2 ,..., X ie ,J for some C £ [t], and 
C' is monochromatic w.r.t C n while [i C ']_ x . is not. 


(32) 
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(Ties can be broken arbitrarily. The above equivalence follows from Proposition 4.10 along with the fact 
that X n is the last in NEO.) Define 


df{X n ) := dpixjniCj}.^, 
df{X n ) := 

d<\X n ) := d P (X n ) n {Cj } jKi , 

d<\x n ) := d N (x n )n 


We will choose C' n := |Cg, C[, ..., C'^ a(x ^ j, where C' 0 is an empty clause whose weight is 2, and for each 
i e {1,..., |d(X n )|}, C\ := [Ci]_ Xn has weight 


weight(C') := 


0 if color a <i (Xn) (C< V X n ) + color d <i {Xn) {C[ V X n ) = 0, 

color a<Vv S C 'i VX n )+co\ov (C'VX n ) 

Op K-X-n) (Wn ) 


olor 


9§*(Xn) 


(C- VA'„)+color < 




(C<VX n ) 


otherwise. 


While the above definition of C n still satisfies (31), we now have for each C[ £ C' n , either vars(C') = 0 or 
vars(C') = vars(Ci) — {X n } for Ci £ d(X n ). Hence after eliminating X n , the remaining hypergraph is still 
/3-acyclic. Moreover, \C n \ = |<9(X„)|, which guarantees that after eliminating X n , the resulting #WSAT 
instance has the same size as the original. We conclude that: 

Theorem 8.4 (Brault-Baron, Capelli, and Mengel [18]). #SAT is polynomial-time solvable for the class of 
f3-acyclic CNF formulas. 


8.4 Output representation 

Our framework, in addition to being able to handle multiple input representations, is also able to handle 
different output representations. In particular, we describe how one can modify InsideOut to obtain different 
output representations. 

Inspired by the factorized database ideas developed by Olteanu and Zavodny [73], the extension to 
aggregates in Bakibayev et al. [13], and inspired by the discussion in Section 8.2 regarding using an FAQ- 
instance as an input factor, we observe that an FAQ-expression not only defines a new function, but also 
stores the computation needed to compute that function. There is a spectrum of tradeoffs one can explore 
in terms of the time it takes to query into a function stored in a “compressed” form and the time it takes to 
decompress it. 

In order to explain this idea more cleanly, and to simplify the presentation a bit, we will only consider 
FAQ-SS instances, i.e. 

= ® ® Vts(xs), 

X[rx]-F See 

where F = [/] for some / S [n]. Furthermore, we will assume that D = (0,1}. 

We now illustrate how we can modify InsideOut so that we can represent the output in three different 
output representations (in addition to the default listing representation). We will compare these output 
representations across the following three axes: 

1. Output pre-processing time: This is the time needed to compute the output in the required form; 

2. Value query time: This is the time needed to check if <p( y) = 1 given the query y G nf=i Dom(Xj); 
and 

3. Enumeration delay: Given the output representation, how much time it takes between reporting two 
consecutive output tuples, given that ultimately we want to list all output tuples. 
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Recall from Section 5.2.3 that InsideOut runs until it has eliminated the variables X n ,..., Xf +1 and then 
runs Outsideln on the resulting FAQ-SS instance tpf corresponding to Hf. In two of the options below, we 
will see how we can change this last step. Further, we will only talk about the runtime of the algorithm after 
it has eliminated the variables X n ,..., Xf + i. 

Listing representation. This is the default behavior of InsideOut. After spending 0(_/V faqv A c d) time in 
eliminating variables X n ,... ,Xf + i, it runs Outsideln on the resulting query (9) taking time 0(AGM(F)) to 
report the output ip (where ||y>|| < AGM(F)). Here, the output pre-processing time is 0(AGM(F)), the value 
query time is 0(1), and we have a constant enumeration delay. 

Next, we consider two options where we have a smaller output pre-processing time while we can still 
handle the value query and enumeration operations (at potentially a larger cost than above). 

FAQ representation. Another option is to just not do any output pre-processing nor eliminate the free 
variables. Note that the FAQ instance <pf constructed when InsideOut eliminates Xf +1 is a valid representa¬ 
tion of the output. Further, the value query time is still 0(1) (by checking if for every T £ £/, V’T(yr) = 1). 
However, there is no way to construct an enumeration algorithm with a constant delay from this represen¬ 
tation. Finally, this representation has the advantage that if one considers the input as an FAQ instance 
(see Section G.3), then there is a nice symmetry in that the output here is also an FAQ instance. Given 
the discussion in Section G.3, this generalizes the framework of Olteanu and Zavodny [73] from factorized 
representation to FAQ representation. 

0(l)-delay enumeration representation. Another option is to eliminate free variables but avoid running 
the Outsideln algorithm at the last step (i.e. drop line 19 of Algorithm 1). In this case, we already have the 
output stored in a factorized representation like that from the factorized database framework. If the output 
was only required to be in this form, then essentially our runtime is 0(iV faqw (without the +|M| term). 

In this output representation, the value query time is 0(1) (since one can check if for every i £ [/], 
ifUi {yUi ) = 1). Perhaps more importantly, one can design a 0(l)-delay enumeration function. Assume that 
the ipm are stored as a BTree/trie (with sort order Xi, X %,..., Xf). Start with the ‘first’ x\ such that 
ifui{ x i) = 1 and then figure out the ‘first’ such that ipu 2 ( x i, x 2 ) = 1 and so on. It is easy to check that 
there is 0(1) delay in outputting every two consecutive output tuples. 

FAQ composition and message passing Being able to efficiently enumerate entries of a function defined 
by an FAQ expression might have applications, say in answering a conjunctive query in a database. Thus, 
the 0(l)-delay enumeration representation is a decent choice. However, as we mentioned in Section 8.2 
there are cases where we want to use the output of one FAQ-query to feed into the input of another FAQ- 
query. Thus, the output representation should allow us to efficiently answer conditional queries and product 
marginalization queries (e.g. universal quantifiers in logic applications). 

The problem is, we do not know in advance in which variable ordering the future conditional queries and 
product marginalization queries will be posed. If they were posed in the same order as X\,... ,Xf, then 
the 0(l)-delay representation is sufficient. We can easily answer a conditional query using the intermediate 
factors ipm, i € [/], efficiently in 0(1) time. 

This is where we can use the playbook of the graphical model literature. The key reason that message 
passing (or belief propagation) is advantageous over variable elimination is that it prepares the graphical 
model for future (unknown) queries. The message passing algorithm is essentially variable elimination run 
in all directions at once. We compute the tree decomposition of the graph Hf (after using InsideOut to 
eliminate Xj, i > /.) Then, we run the message passing algorithm on the bags of the tree decomposition. 
The overall complexity is still 0(7V fhtw ( ? G)), but after convergence all the bags are in calibrated state, and 
they allow for answering conditional queries along any GYO elimination order of the tree decomposition. 
We leave the tradeoffs involved in realizing such an idea for a future work; but we would like to point out 
that Olteanu and Zavodny [73] has taken a step toward this direction in the database domain. 
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8.5 Composition of FAQ instances 

We consider how the fractional hypertree width changes when we compose FAQ instances. More precisely 
we consider the following problem: 

Let H° = (V,£°) be a hypergraph. For every e £ £°, let Hi = (e, £*) be a hypergraph. Let H 1 
denote the collection of hypergraphs {Hl} eG £o. Now consider the following composed hypergraph 
H° o H 1 with V as the set of nodes and the edge set £ 01 = U eg go £\. How does fhtw(%° o H 1 ) 
behave with respect to fhtw("H°) and fhtw(7tg) (or in terms of other widths of these hypergraphs) 
for e £ £°? 


We begin with a simple observation: 

Proposition 8.5. For every hypergraph H° and a corresponding collection of hypergraphs H 1 , 

fhtw(%° o H 1 ) < fhtw("H°) • rna xp*(Hl). 


Proof. Let (T,x) be a tree decomposition for H° with p*-widtli of w = fhtw (TL°). It is easy to check 
that (T,x) is also a valid tree decomposition for H° o H 1 . Further, we claim that for every bag B in T, 
< w • max e€ fo P*(Hl), which would complete the proof. 

Finally, we argue the claim. Fix any bag B and let x e for every e £ £° such that e C\B ^ 0 be the optimal 
edge cover for B (i.e. ]Cee£°:enB^0 Xe — w )• For every edge e £ £°, let j/|, for every e' £ £\ be an optimal 
fractional edge cover for H\. Then it is easy to check that the following is a valid edge cover for B using 
edges from H° o H 1 : for every e! £ £ 01 such that e! PI B ^ 0, define 


Ze> = E x e'Vt'- 

eef°:e'n 


The claim follows by noting that 


51 ** ^ 

e'e£ ol :e'nB/0 



which completes the proof. 


( max 
e6f°:Bne^0 


E 

e'e£^:e'nenB^0 



□ 


Remark 8.6. We note that the proof above gives a better bound than that stated in Proposition 8.5 for 
specific hypergraphs. However, we chose to present a uniform bound for all hypergraphs for its elegance. 


It is natural to wonder if one can improve upon the bound of Proposition 8.5 in the worst-case. In 
particular, it is natural to wonder if we can prove a bound of the form 


fhtw("H 0 o H 1 ) < O ( fhtw("H 0 ) • maxfhtw(H):) 

V ees° 

We argue next that such a bound is not achievable. 

Lemma 8.7. There exists hypergraphs H° and a family of corresponding hyerpgraphs H 1 such that there is 
an unbounded gap o/f2(|V|) between fhtw("H° o H 1 ) and fhtw(H°) • max eg £o fhtw('Hg) 

Proof. Let V = {ai,..., a n , bi ,..., b n }. The hypergraph H° has the following n hyperedges: ei = {ai,..., a n , bij. 
Note that this is essentially a star graph with fhtw("H°) = 1. Further, for every i £ [n], the graph H\ x is the 
star graph with a* as the center and di,..., a,_i, aj+i,..., a ni bi as the leaves. Note that for every * £ [n], 
fhtw^g.) = 1. 

However, note that H° o H 1 has a K n as a subgraph (in particular the subgraph on {ai,... ,a n } forms 
a clique). Further the only other edges are the n “spokes”- (oi,6j) for i £ [n]. It can be verified that 
fhtw(H° o H 1 ) > n, which completes the proof. □ 
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We now come back to the question of proving an upper bound on fhtw(77° o 'H 1 ). We first note that 
the argument in proof of Proposition 8.5 is wasteful since it does not make use of any tree decomposition 
representation of the hypergraphs in TL 1 . Next we describe a simple algorithm that tries to take advantage 
of this choice. 

We begin with an optimal p*-width tree decomposition (T, y) of 77° as in the proof of Proposition 8.5. 
Consider an arbitrary bag B in this tree decomposition. For every edge e £ £° such that e PI B ^ 0, let 
(T e ,y e ) be an optimal p*-width tree decomposition for Hi- Root this tree at an arbitrary bag r e . Build 
new tree T' as follows: for each bag B we create a new bag B’ as follows. For every edge e £ £ a such 
that e fl B ^ 0, add r e to B’ and hang the rest of tree T e from B'. Using the argument in the proof of 
Proposition 8.5, one can argue that for every bag B', p^ 0oW i(i?') < fhtw(77°) • max eg £o fhtw(%g). Define 
X 1 as follows: for every t £ V(T), if B = y(t), then y'(t) = B'. For every other vertex t (that comes from 
some T e ), y'(f) = y e (t). However, (T',y') is not a valid tree decomposition because it might not satisfy the 
running intersection property. We fix this with obvious greedy ‘patchup’ phase. The final tree T" = T' but 
we will modify the bags by defining a new map . For every vertex v £ V, consider the sub-forest of T with 
vertices whose bags contain v. Add in the set of vertices t' to this forest such that the resulting sub-graph 
is a tree and for each such t', <— y"(t) U {£} (for every t, y"(t) is initialized to y'(t)). Note that when 

the patchup phase is done (T",y") is indeed a valid tree decomposition for H 0 otf. It is not too hard to 
verify that this tree decomposition gives the following bound: 

Lemma 8.8. The following holds 

fhtw(H°o n 1 ) < t max^p^ 0 o - H i{x"(t)) 

< fhtw (H°) • maxfhtw (U\) + ^max^ p* H0oH i (y"(t) - y'(i)). 

We finish with two remarks: 

1. In the worst-case the algorithm will result in (T", y") = (T, y) (after reducing (T", y")), in which case 
are back in the proof of Proposition 8.5. 

2. Lemma 8.8 for the hard instance in proof of Lemma 8.7 translates into a bound of n + 1, which is 
essentially tight. 

9 Concluding Remarks 

The algorithms and ideas developed in this paper are not just on paper. Implementations of join algorithms 
based on fractional hypertree width for counting graph patterns have shown that the theory predicts what 
happens in practice very well: as they are faster than existing commercial systems by at least an order 
of magnitude [70] for selected queries. Far beyond graph patterns, we have implemented InsideOut within 
the commerical LogicBlox database system [9] with great performance results. Moreover, learning from the 
beautiful work of Olteanu and Schleich [72,83], we realized [66] that InsideOut can be used to train a large 
class of machine learning models inside the database. Our implementation showed orders of magnitude 
speedup over the traditional data modeler route of exporting the data and running it through R or Python. 

Real-world implementation of InsideOut faces two additional hurdles. The first problem is that we do 
not just have materialized predicates as inputs, we also have predicates such as a < b, a + b = c, negations 
and so on. These predicates do not have a “size”. To solve this problem, one solution is to set the “size” of 
those predicates to be oo while computing the AGM-bound. For instance, if we have a sub-query of the form 
Q £- R(a , b), S(b,c),a + b = c, where R and S are input materialized predicates of size N, then by setting 
the size of a + b = c to be infinite, AGM(Q) = N 2 . This solution does not work for two reasons. (1) If we 
knew a + b = c, then it is easy to infer that |Q| < AT and also to compute Q in time O(N): scan over tuples 
in R, use a + b = c to compute c, and see if (6, c) £ S. In other words, the AGM-bound is no longer tight. 
(2) The solution may give an oo-bound when the output size is clearly bounded. Consider, for example, 
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the query Q •<— R(a),S(b),a + b = c; in this case, {a, 6, c} is the only hyperedge covering vertex c in the 
fractional edge cover. Our implementation at LogicBlox makes use of generalizations of AGM to queries with 
functional dependencies and immaterialized predicates (such as a + b = c). These new bounds are based on 
a linear program whose variables are marginal entropies [3,4], 

The second problem is to select a good variable ordering to run InsideOut on. In principle, one does 
not have to use the AGM-bound or the bounds from [3,4] to estimate the cost of an FAQ subquery. If one 
were to implement InsideOut inside any RDBMS, one could poll that RDBMS’s optimizer to figure out the 
cost of a given variable ordering. However, there are n! variable orderings, and optimizer’s cost estimation 
is time-consuming. Furthermore, some subqueries have inputs which are intermediate results. Hence, it is 
much faster to compute a variable ordering minimizing the faqw of the query, defined on the bounds in [3,4]. 
As the problem is NP-hard, either an approximation algorithm (from Section 7) or a greedy heuristic suffices 
in our experience. 
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A More examples of problems reducible to FAQ 

This section presents many examples showing how FAQ captures a wide range of problems, such as graphical 
model inference, matrix multiplication, constraint satisfaction, quantified conjunctive query evaluation, etc. 
Some of the reductions to FAQ-SS are already discussed in the seminal work of Dechter [30], Aji and McEliece 
[8], and Kohlas and Wilson [57]; other reductions to FAQ-SS are new; all reductions to the general FAQ 
problem (over multiple semirings) are new. We first present the examples for FAQ-SS by considering the 
following semirings. 

• ({true, false}, V, A): the Boolean semiring 

• (R, +, x): the sum-product semiring 

• (R-|_,max, x): the max-product semiring, which is essentially equivalent to the (R,min, +) semiring. 17 

• (2 s7 , U, n): the set semiring, where U is some set called the universe. In this case, 0 is the additive 
identity, and U itself is the multiplicative identity. 

17 If the ranges of factors are non-negative, then we can take log of the factors to turn product into sum. 
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A.l The Boolean semiring 

Example A.l (Satisfiability). Let <p be a CNF formula over n Boolean variables X 1; . .., X n . Let R = (V, £) 
be the hypergraph of ip. Then, each clause of p is a factor ips, and the question of whether p is satisfiable 
is the same as evaluating the constant function 

v = V A 

x see 

Note that in this case each factor is very compactly represented. The size of each factor ips is 0(|5'|), i.e. 
linear in the number of variables that the factor is on. We will have much more to say about such compact 
representation in Sections 8 and G. 

Example A. 2 (fc-colorability). Let G = ( V,E ) be a graph. Define the following instance of FAQ-SS. Let 
V = {X v | v G V}, £ = E, and Dom(X„) = [k]. Define a factor ip uv for every edge uv G E to be the predicate 
ipu,v(ci,C2) = (ci 7^ C2). The question of whether G is fc-colorable is equivalent to evaluating the following 
constant function: 

P \J ^Puv > 'Ey ) • 

x uv(zE 

Even for non-constant k, each factor in this example is also very compactly represented, amounting to an 
inequality. 

Example A.3 (Boolean conjunctive query). The Boolean conjunctive query evaluation problem (BCQ) can 
be written as follows. The query $ has input relation set atoms(<F). Each relation R G atoms(d>) is on 
attribute set vars(i?). We want to know if there exists a tuple satisfying all relations: 

$ = f\ i?(vars(i?)) 

atoms(*J>) 


Define a hypergraph H = ( V ,£), where V is the set of all attributes, and £ = |vars(i?) | R G atoms($)}. For 
each S G £ corresponding to relation R , there is a factor ips where ipsi^-s) = (xg G R). Then, the problem 
is reduced to evaluating the constant function 

<P = V A V’s(xs). 

x se£ 

Note that in this case, the typical input relation encoding is to list all tuples that belong to a given input 
relation. Thus, the inputs are not as compact as the first two examples. 

Example A. 4 (Constraint satisfaction). The Constraint Satisfaction Problem (CSP) is reducible to FAQ-SS 
in the obvious way. Note that SAT, 3-colorability, BCQ are all special cases of CSP. 

To summarize, the subtle issue of input encodings already shows up in the above examples. In SAT each 
input factor ips is encoded using a clause of size 0(|<Sj). In BCQ we use the listing encoding , where each 
input factor is encoded with a table of entries whose functional values are true. The CSP problem as defined 
above is underspecified. 

Example A.5 (Conjunctive query). In the general conjunctive query evaluation (CQE) problem, the query 
has existential quantifiers over a subset of variables in [n] — F. It should be obvious that conjunctive query 
evaluation is reduced to the following form of FAQ-SS. The problem is to compute the output function 

p(x F ) = \J /\ ips(x s ). 

— F S 
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Example A.6 (Natural join query). The natural join query is the CQE problem when all variables are free. 
Essentially, a natural join query is a quantifier-free conjunctive query , after some pre-processing steps. The 
reason we need pre-processing is because, strictly speaking in a conjunctive query the same relation might 
appear several times, and the same variable might appear more than once in the same atom. This point is 
immaterial at this point of our discussion. 

Example A.7 (Coding theory). Here’s a typical setting in coding theory. Let q be a prime power, and F g 
denote the field of order q. Let n > k > 1 be integers. Then, an (n, k) q -code C is a simply a subset of F” of 
size F*. If every pair of codewords (i.e. members of C) have Hamming distance at least d, then we call the 
code an (n, k , d) 9 -code. The List Recovery problem [48] in (modern) coding theory is the following problem. 
We are given, for each position i £ [n] of the code, a subset Si C F g of symbols. The objective is to “recover” 
the set of all codewords c = (ci,... ,c„) £ C for which Ci £ Si for all i. (Technically, we also want to give 
some lax where C; £ Si for only a fraction of the positions i; but that requirement is not important for our 
discussion here.) List recoverable codes are codes for which, if the Si are “small” then the list of codewords 
to be recovered (satisfying the above condition) is also “small.” Of course, we also would like the recovery 
process to be as fast as possible. 

The list recovery paradigm has found important applications. It should be noted that when |S)| = 1 
for all i then we get back the list decoding problem. Powerful expander graphs are constructed using the 
Parvaresh-Vardy family of codes, precisely because they are list recoverable [49]. In [68], we used list recovery 
to construct very good group testing schemes. 

In the list recovery problem, we have n + 1 factors: ip[ n \, and ipi for i £ [n], where 

ip\ „] (c) = true iff c £ C 
ipi{c) = true iff c £ Si. 

The FAQ-SS instance is 

n 

^(ci,...,c„) =^[„](ci,...,c„) A /\ipi(ci). 

2=1 

A.2 The sum-product semiring 

Example A.8 (Complex network analysis). In complex network analysis we often want to count the number 
of occurrences of a given small (induced or non-induced) subgraph H inside of a massive graph G. This 
problem is pervasive in social network and biological network analysis, where each occurrence of the subgraph 
is a “pattern” that one wants to mine from the network. A canonical example is the triangle counting problem, 
where we want to count the number of triangles in a given graph G = (V, E). The number of triangles is 
used to compute the clustering coefficients [64,84] and transitivity ratio [75,87]. We reduce triangle counting 
to FAQ-SS as follows. The FAQ-SS’s hypergraplr is 7i = (V,£), where 

V = {1,2,3}, 

£ = {{1,2}, {1,3}, {2,3}} 

The domains are Dom(Xi) = V, for i £ [3]. The factors are all the same: ipi 2 = ifi 3 = fi’n = if, where 

,, . I 1 if {u, t>} £ E, u < v 

w(u, v) = < 

1^0 otherwise. 

(In terms of input encoding, the factor if can be represented with a data structure of size 0(\E\), for 
example.) The problem is to compute the constant function 

*>= E E E l/j(xi,x 2 ) ■ lf(xi,x 3 ) • lf{X2, xfi). 

Xi£V X2^V X3^iV 

The same strategy can be used to count the number of occurrences of H in G, where H is a fixed graph. 
For example, H can be a fc-clique. 
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Remark A.9. If we’d like to list induced subgraphs, there should be other factors indicating the non¬ 
existence of edges in the subgraph pattern. These factors correspond to inequalities, and they can be 
compactly represented. 

Example A.10 (#SAT). In SAT we want a Boolean answer: “is the formula satisfiable or not?” In #SAT 
we want a more specific piece of information: “exactly how many satisfying assignments are there?” Let $ 
be a CNF formula over n Boolean variables X \,..., X n . Let T~L = (V, £) be the hypergraph of <I>. Then, for 
each clause C of $ there is a factor ips defined by 


^s( x s) 


1 if xg satisfies C 
0 otherwise. 


The problem of counting the number of satisfying assignments is the same as evaluating the constant function 

*=£n 4>s(xs)- 

x se£ 

Example A.11 (Permanent). #SAT is #P-complete. Another canonical #P-complete problem is Perma¬ 
nent, which is the problem of evaluating the permanent perm(A) of a given binary square matrix A. Let S n 
denote the symmetric group on [n], then the permanent of A = (aj 7 )™- =1 is defined as follows. 


perrn(A) := 

7TgS n i=l 

Note that perm(A) has exactly the same form as det(A) written using Leibniz formula except that the signs 
are all 1. 

The Permanent problem can be written in the sum-product form as follows. Say the input matrix is 
A = ( aij ), there is a singleton factor ipi for each vertex i , where ipi(j) = atj. Then, there are Q) factors 
ipjk for j ^ k G [n], where ipjk{x,y) = 1 if x ^ y and 0 if x = y. The problem is to evaluate the constant 
function _ 

<p=e n V’iO*) n jk{xj,x k ). 

X ie[n] O¥*0 

Example A.12 (Probabilistic graphical models). We consider probabilistic graphical models (PGMs) on 
discrete finite domains. Without loss of generality, we restrict ourselves to undirected graphical models (also 
called Markov Random Field, factor graph, Gibbs distribution, etc.) If the input model is directed, we 
moralize it to get an undirected model. The model can be represented by a hypergraph H = (V, £), where 
there are n discrete random variables Xi,... ,X n on finite domains Dom(Ai),..., Dom(A„) respectively, 
and to factors (also called potential functions ): 

V’s : n Dom(A,;) — t R_|_. 

ies 

Typically, we want to learn the model and perform inference from the model. 18 For example, we might want 
to 


• Compute the marginal distribution of some set of variables 

• Compute the conditional distribution p(xa | xg) of some set of variables A given specific values to 
another set of variables xb ■ 

18 Prom a Bayesian point of view, there is no difference between the two. “Learning” refers to the task of estimating the 
parameters of a model given the observed data. For example, we often estimate the parameters using the Maximum A Posteriori 
(MAP) estimate 6 = argma xgp(6 | x/\) = argmaxg [ log fj(x,i | 6) + logp(0)}. Here, X 4 is the set of observed values. (MAP is 
also called MPE which stands for most probable explanation, because it is the mode of the posterior distribution.) “Inference” 
refers to the task of computing p(xg \ x 4 . 6), where xg are the hidden variables and 9 are known parameters. 
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• Compute argmax XA p(xyi | x^) (for MAP queries, for example). 

When we condition on some variables, we can restrict the factors to only those entries that match the 
conditioned variables. It is obvious that the first two questions above are special cases of the FAQ-SS 
problem on the sum-product semiring. (The third question is on the max-product semiring.) These are 
well-known facts [58]. 

Since matrix-vector multiplication is a special case of matrix-matrix multiplication, Example 1.1 also 
shows that the matrix vector multiplication problem is a special case of the FAQ-SS problem. Given the 
above, it also follows that computing the Discrete Fourier Transform is a special case of FAQ-SS. Next, we 
present another interpretation of the DFT from Aji’s thesis [7], which immediately shows that our algorithm 
implies the Fast Fourier Transform (FFT). 

Example A.13 (Discrete Fourier Transform). Recall that the discrete Fourier transform (DFT) is the 
matrix vector multiplication where A xy = e l2 ' K ~. For this example, we will consider the case when n = p m 
for some prime p and integer m > 1. Recall that the DFT is defined as follows: 

n —1 

V(x)='£bye i2 ^. 
y- o 


Write x = Y^Lo Xi ' P* and V = Xqlo* Vi ' P^ ai their base-p form. Then we can re-write the transform 
above as follows: 


ip(xo,xi,.... 


0 = 


E - 

(2/Ov,2/m-i)GF™ 


Ep<j + fc<2m-2 Xj-y k pi + k 


x j ’ y fc * pj ' ^ 

Recalling that n = p m , note that for any j + k > m, we have e* 27r s = 1. Thus, the above is equivalent 
to x _ v 

ip(x 0 ,xi,...,x m -i) = ^2 b v ' II e 2 *p™— 3 -*. 

o<j+fe<m 

The above immediately suggests the following reduction to FAQ-SS. Let Ti. = (V, £) where V = 
{A 0 , Ad,..., A m _i, Y 0 , Yi,..., y m _i} and £ has an edge (Xj,Y k ) for every j,k e {0,1,..., m - 1} such 
that j + k < m. Further, there is another edge (Fo> Fi,..., F m _i). The variable domains are Dom(Xi) = 
Dom(Fi) = {0, 1,... ,p — 1}. For every j , k £ F p such that j + k < to, the corresponding factor 

Vw,.,Y- fc : K D 

is defined as 

i>x jt Y k (x,y) = e^p m - 3 ~ k . 

Finally, the factor tfy : F™ —> D is defined as 

2^Y(?/0 5 2/1 > • • • ) 2/m—l) b(yo,yi,...,ym—i) * 


Then the output is 


3?1 5 ... ? Xjyi— 1 ) 


(y 0 ,...,y m _i)gF™ 


Vm- 1 ) • n 1 l) Xj ,Y k {Xj,y k )- 

0 <j-\-k<m 


Example A. 14 (Graph homomorphism function). Given a (non-hyper) graph G and an m x to symmetric 
matrix A = (a,y). The graph homomorphism function is defined by 

z a (G) = Yl II a au),c(v)- 

C-V— >[m] uvEE(G) 
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(The reader is referred to the masterpieces by Cai et al. [20], Goldberg et al. [40], references therein for a 
long and fascinating history of this problem.) The corresponding FAQ instance is 

1 P = w(x u ,X v ), 

XI In uveE(G) 

where Dom(JQ) = [to], and ipuv ■ [m] x [to] —> D are identical factors, mapping ip U v{x u ,x v ) = a XutXtl . 

Example A. 15 (Holant problem). The Holant problems can also be expressed using FAQ-SS. Essentially, 
in a Holant instance there is a function /„ for each vertex of a graph G, where the function is on the set 
of incident edges. Each edge can be assigned a domain value. The problem is to compute the sum over all 
edge assignments of the product of all vertex factors. See [21] for more details. 

A.3 The max-product semiring 

In addition to MAP queries in PGMs which are well-known to be reducible to FAQ-SS on the max-product 
semiring, the following is another common instance of the max-product semiring. 

Example A.16 (Maximum Likelihood Decoder for Linear Codes). We present the instantiation of Maximum 
Likelihood Decoding (MLD) for linear codes as an instance of FAQ-SS from Aji and McEliece [8]. We consider 
the decoding problem from a discrete memoryless channel. We assume that the alphabet is F 2 and given 
i/,i6 F 2 , ip p (y,x) denotes the probability of the receiver receiving y given that x was transmitted over the 
channel. Let G be a binary linear code with dimension k and block length n. Then given a received word 
y £ Fff and a codeword c £ G, the probability that the receiver receives y when c was transmitted is given 

by 

n 

p r[yl c ] =YlMyi’ c i)- 

i=1 

The maximum likelihood decoder, given y £ FJ outputs the codeword c £ G that is most likely, i.e. it 
outputs: 

arg max Pr[y|c]. 

For simplicity, we will concentrate on the related problem of computing the most likely probability, i.e. 

maxPr[y|c]. (33) 

c£C 

Define the hypergraph V = ([n],£) as follows. There is a singleton edge {«} for every i £ [n] (with the 
corresponding factor 'tpiiYj. A,;) = ip p (Y ,;, A ?: )). Further, the linear code G has an (n — k) x n parity check 
matrix H such that c £ G if and only if H ■ c T = 0. Then for each row H v we have supp(-ffj) £ £. (The 
factor ipHj ( x supp(ffj)) is defined to be 1 is the corresponding parity check is satisfied by x and 0 otherwise.) 
It is easy to verify that the FAQ-SS instance below is equivalent to the problem in (33): 

¥> = max i>i(yi,Xi) ■ Vbf 3 (x SU pp (ff .)). 

9 ie[n] je[n—k] 


A.4 The set semiring 

Example A.17 (Natural join query). Consider the natural join query 

•F = Xii{ 6 atoms($) R- 
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Let H = (V, £) be the query’s hypergraph. We define a factor ips for each S' £ £ as follows. The domain of 
the set semiring is D = 2 U , where U = nr=i Dom(Xj). Let R be the relation corresponding to S. Then, ips 

^(x S )4 {t| " s(t)=xs} ifxsGi? 

J \0 ifxs^S 

Computing the output of $ becomes computing the constant set-valued function 

<£ = 1 J H ^s)- 

X See 

A.5 Applications of the FAQ problem 

The FAQ-SS problem is quite general, as we have seen above. However, there are at least two classes of 
problems which are not captured by the basic FAQ-SS formulation above. 

First, consider the problem of counting the number of answers to a conjunctive query. This is called the 
jpCQ problem [79]. If the input query is quantifier-free, then we can formulate the problem as FAQ-SS on the 
sum-product semiring by mapping true to 1 and false to 0. However, when the query does have existential 
quantifiers, the straightforward mapping does not work. 

Second, consider the general Quantified Conjunctive Query (QCQ) problem, which has both existential 
and universal quantifiers (and a conjunction of atoms inside). It is not possible to reduce this problem to 
FAQ-SS. The related problem of counting the number of solutions to such formulas (#QCQ) is even harder. 

Example A.18 (#CQ). Let $ be a conjunctive formula of the form 


*(X 1 ,...,X / ) = 3X f+1 ...X n 


A R 

t atoms(<I?) 


and the problem is to count the number of assignments (x \such that d>(xi,..., Xf) is satisfied. 
Then, we can reduce this problem to FAQ with the constant function 

(/; = V • • • V max • • • max TT ip s (x s ). 

X f +1 Xn 

X\ Xf S£c 

Here, ips(x-s) = S xs eR: where R £ atoms($) is the atom corresponding to S. Note that ^ and max are not 
commutative. 

Example A. 19 (QCQ First Reduction). Let $ be a first order formula of the form 


$(X 1 ,...,Xf) = Qf +1 X f+ i---Q n X n \ f\ R 

atoms(<£) 

where Qi £ {3,V}, for i > f. The problem is to compute the relation $ on the free variables Xi,... ,Xf. 
Then, we can reduce this problem to FAQ with the function 

¥>(*1, ■ • • - */) = ''' ©i"’ IIse£ ^s( x sO- 

Similar to Example A.18, ips(x-s) = S xs& r, where R £ atoms($) is the atom corresponding to S. And, 
0 b) _ max if = 3 and 0^ = min if Qi = V. 

There is another way to reduce QCQ to FAQ. It is this reduction that we will use later in the paper. 
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Example A.20 (QCQ - Second Reduction). Let $ be a first order formula of the form 


$(X 1 ,...,X f ) = Q f+1 X f+1 ---Q n X n \ f\ R , 

yi?Gatoms($) J 

where Qi £ {3,V}, for i > f. The problem is to compute the relation $ on the free variables Xi,... ,Xf. 
Then, we can reduce this problem to FAQ with the function 

<P( X i -' ■■ > x f) = ’'' ©i™ } Tlsef V’s(xs)- 

Similar to Example A. 18, i/>s( x s) = ^x S eR- where R £ atoms(<l>) is the atom corresponding to S. And, 

/tn( 0 _ j max if Qi = 

W “ \ x if Qi = V. 

What is interesting to note about the above reduction is that the product occurs as an aggregate operator 
over (universally quantified) variables and also as an aggregate operator over the input factors. 

Example A.21 (Example 1.3 revisited: #QCQ). This is similar to the above two examples. Let $ be a 
first order formula of the form 


$(A 1 ,...,X f ) = Q f+1 X f+1 ---Q n X n \ f\ R 

atoms($) 

where Qi £ {3, V}, for i > f. The problem is to count the number of tuples in relation $ on the free variables 
X-[ ,..., Xf. Then, we can reduce this problem to FAQ with the constant function 

*> = £■••£ ®i',f f n s « Mw)- 

X\ Xf 

Where i(>s( x s) = ^x S gRi where R £ atoms(d>) is the atom corresponding to S. And, 



e (i) 


max if Qi = 3, 
x if Qi = V. 


B Reductions from non-semiring to semiring 


In practice, we often encounter queries which have the same format as FAQ except that some of the aggregates 
®W are neither product not semiring aggregates. Suppose that we want to compute the function 

ip : Dom(Xj) —D 

ielf] 


defined by 

^*[/]) = ©£? • • • ©£? ^s(xs). 

WLOG let ® := ®("^ be neither a product nor a semiring aggregate (Otherwise, we could have eliminated x n 
as described in Section 5.2). In many situations, it is possible to find a two-way mapping between (D, ®, ®) 
and some commutative semiring (D, ©, §>) where D is some extended domain and ©, (§> are extended versions 
of ©, ® that fit the new domain D. If such a mapping exists, then we can transfer input factors into D, carry 
out the calculations over the semiring (D,©,®), and then transfer the results back to the original domain 
D, in order to bypass dealing with non-semirings. 

Formally, suppose that there exists a function / : D —» D and a function / : D —»• D that satisfy the 
following conditions: 
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1. For all xi,.. ,,x„ £ D (where n > 1), @ ie[n] Xi = f (® ie[n] /(a:i)) ■ 

2. For all x,y £ D, f(x ®y) = f(x)®f(y). 

3. For all x, y £ D, f{x®y) = /(x) 0 /(y). 

If the above conditions are met (Examples B.2 and B.3 below), then variable elimination (Section 5.1.2) can 
take place as if (D, 0, 0) were a commutative semiring. 




(34) 



(35) 


■■©ir 15 / [(® s ^ ( „)/ws(x S )))® (Qy&s^nMxs)))] 

(36) 

m (/+1) . 

■ ■ ©i'r^ ©s^a(n) ^s(xs) 0 / (®^<8>sea(n)f (V's(xs))) . 

(37) 


new factor V'f/n —{n} over D 


If only conditions 1 and 2 (but not 3) are met (Example B.4), then we can still apply steps (35) and 
(36) (but not (37)). In particular, factors V’s(xs) will have to remain transformed / (Vts(xs)) into the new 
domain D, and the new factor will be over D as well. We cannot transfer back to the original domain D 
until all products 0 have been computed. 

Let 0,1 £ D denote the identities for 0 , 0 respectively, and 0,1 £ D denote the identities for 0 , 0 
respectively. Many natural FAQ algorithms rely on zero entries in order to save computation (e.g., Outsideln 
and InsideOut in Sections 5.1.1 and 5.2 respectively). Therefore, it is desired that / and / satisfy the 
properties 0 = /( 0 ) and 0 = /( 0 ). 

Remark B.l. Notice that conditions 1 and 3 above imply that /(1) = 1. In particular, assuming /(I) ^ 1, 
we have / [/(l)<g>l] = / [./(l)] = 1, while / [/(1)] 0 /(I) = 10 /(I) = /(I). 

Here are a few practical examples that can clarify the above abstract ideas (In all of those examples, we 
have 0 = /( 0 ), 0 = /( 0 ), I = /( 1 ), and 1 = /(I)). 

Example B.2 (Average). Given n > 1 numbers x \,..., x n £ R, let avg(xi,... ,x n ) denote the average of 
non-zero numbers among x \,..., x n . (By convention, when all numbers are zeros, let the average be zero.) 


avg(xi,... ,x n ) 


0 

a-iH- j-x n 

Sx 17 i0-\ -b<5x n7 !0 


if x\ = X 2 = ■ ■ ■ = x n = 0 , 
otherwise. 


(M, avg, x) is not a semiring because avg is not associative (e.g., avg(avg(l, 2), 3) ^ avg(l, avg(2, 3))). How¬ 
ever, we can extend it into the commutative semiring (R x N, 0 ,0) where 0 and 0 are defined as follows. 
For all (oi, 6 i), (< 22 , 62 ) £ R x N, 


(ai, &i)0(a2, ^2) (ai + « 2, bi + 62), 


(oi,6i)0(a 2 ,& 2 ) := (ai x a 2 ,h x b 2 )- 
Define / :R->lxH such that for all x £ R, 

f{x) := (x,S X7 t 0 ). 

Define / : R x N — > R such that for all (a, b) £ (R, N), 

/(a, b) := /° lf6 = 0 ’ 

11 otherwise. 

It is not hard to verify that conditions 1, 2, and 3 are met in this example. Also, notice that 0 = 0, 1 = 1, 
0 = (0,0) and I = (1,1). 
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Example B.3 (Uniqueness quantification (3!)). Given n > 1 Boolean variables bi,... ,b n , let unique(6i, ...,&„) 
denote the truth value of whether there is a unique bi whose value is true. 

unique(6i, ..., b n ) := (3!i G [n] | bi = true). 

The problem with unique is that it is not cumulative over {true, false} (For example, unique(true, true, true) = 
false while unique(unique(true,true), true) = true.) In fact, binary unique reduces to logical XOR, while 71- 
ary unique does not correspond to XOR for n > 2. Therefore, we cannot use the commutative semiring 
({true, false}, XOR, A) to solve this FAQ. However, we can extend the domain to become {0,1, 2} and define 
a commutative semiring ({ 0 , 1 , 2 }, ®, ®) such that for all x,y € { 0 , 1 , 2 }, 

x®y := min(i + y , 2), 

x®y := min(a; x y 1 2). 

Define / : {true, false} —>• {0,1, 2} such that 

/(true) := 1, 

/(false) := 0. 

Define / : {0,1, 2} —»• {true, false} such that for all x G {0,1, 2}, 

fix) := (x = 1 ). 

Notice that conditions 1 , 2, and 3 are met in this example. Also, notice that 0 = false, 1 = true, 0 = 0, and 
1 = 1 . 

Example B.4 ((R,max, x)). (R, max, x) is not a semiring because max does not have an identity over M. 
However, we can fix this by extending the domain to become D := RU {NaN} where NaN is a special symbol 
having the following properties. For all x € D, 

max(x, NaN) = max(NaN, x) = x , 

min(x, NaN) = min(NaN, x) = x , 
x x NaN = NaN x x = NaN. 

Now, we have the max identity 0 = NaN and the product identity identity 1 = 1. However, (D,max, x) 
is still not a semiring because x does not distribute over max in R (e.g., —1 x max(l,2) yf max(—1, —2)). 
However, we can extend it into the commutative semiring (D,ffi,®) where 

D := {(NaN, NaN)} U {(a, b) G R 2 | a < b} , 

and for all (oi, &i), ( 02 , 62 ) G D 

(ai, 6 i)®(a 2 , 6 2 ) := (min(ai, a 2 ), max(&i, b 2 )), 

(ai, &i)< 8 >(a 2 , fc 2 ) := (min(aia 2 , a\b 2l bia 2 ,bib 2 ), max(aia 2 , ai& 2 , bia 2 , bib 2 )). 

Notice that 0 = (NaN, NaN) while 1 = (1,1). 

Define / : D —>• D such that for all x G D, 

fix) ■■= ix,x). 

Define / : D —> D such that for all (a, b) G D, 

f{a,b) := b. 

This example meets conditions 1 and 2 but not 3. 
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C Tree decompositions and variable elimination 

In this section, we prove Proposition 4.9, Proposition 4.10, and Lemma 4.12. To this end, we first describe 
ways to convert back and forth between a tree decomposition and a vertex ordering of a given hypergraph. 
We will also need the notion of a reduced (or non-redundant) tree decomposition. 

Minimal tree decomposition The definition of tree decompositions doesn’t say anything about how 
large a tree decomposition is, i.e. how many vertices T has. Fortunately, many tree decompositions are 
“redundant” in the sense that one bag might be a subset of another. In that case, we can reduce the tree 
decomposition as follows. Let (T. y) be a tree decomposition of a hypergraph H. Suppose there is a bag 
B that is a subset of another bag B'. Note that there is a unique path B = B 0 , B \,..., Bk = B' because 
T is a tree. Then, B has to be a subset of every bag on this path. We can thus remove B , connect the 
all neighbors of B , other than B i, to B\. It is easy to see that the new tree is a tree decomposition of the 
same hypergraph still. When no such bag removal is possible, we have a reduced tree decomposition or a 
non-redundant tree decomposition. The following shows that reduced tree decompositions do not have too 
many nodes. 

Proposition C.l. Let (T, y) be a reduced tree decomposition of a connected hypergraph H on n vertices , 
then T has at most n nodes. 

Proof. Any leaf bag B of T necessarily has a private vertex v, i.e. a vertex that does not belong to any other 
bag. Now, suppose we remove v from B. If B is a subset of its neighbor, then we also remove B. Either 
way, what we end up with is a reduced tree decomposition of a hypergraph on n — 1 vertices. Induction 
completes the proof. □ 

Tree decomposition from vertex ordering. Given any vertex ordering o = v \,..., v n of a hypergraph 
LL = (V, £), we construct a tree decomposition recursively as follows. (Recall the hypergraph sequence 
defined in Definition 4.8. First, by induction, we construct a tree decomposition T„_i of the graph 7 if_i 
using the vertex ordering v ±,..., v n -\. (The base case, i.e. the tree decomposition T\ is trivial as the 
liypergraph Hi has only one vertex.) Note that U° — {u„} is a hyperedge of Hf l _ 1 . Let B be the bag of 
T„_i that contains this hyperedge. Now, create a new bag U n and connect it to B. Note that all hyperedges 
in d(v n ) are subsets of U%. One can verify by induction that this is indeed a tree decomposition of Hf = H. 
The tree decomposition constructed this way has at most n bags, but it is not necessarily non-redundant. 
We reduce the tree decomposition to make it a non-redundant, and refer to the final tree decomposition as a 
tree decomposition induced by the vertex ordering. The following proposition is straightforward by induction. 

Proposition C.2. In a tree decomposition (T, %) induced by the vertex ordering a = (iq,... ,v n ), every bag 
of T is a set Ujf, for some k € \n\. Furthermore, (T, \) is non-redundant. 

GYO-elimination procedure. Next we explain why the vertex ordering is the reverse of what typically 
is called an “elimination order”. The typical way to obtain a good vertex ordering from a tree decomposition 
is the GYO-elimination procedure [44,85,91]. In this procedure, we repeatedly apply the following two 
operations on a tree decomposition: (1) remove any bag that is a subset of another bag, (2) remove any 
vertex that belongs to only one bag. Typically this procedure is done by fixing arbitrarily a root bag and 
eliminating bags and vertices from the leaves up to the root bag. 

The reversed sequence of vertices that were removed is a vertex ordering whose induced (fractional) width 
is as good as the (fractional) width of the tree decomposition, as we show below. One key aspect of the 
GYO-procedure is that we could have fixed any bag as the root bag, and thus the vertices inside this bag 
can be eliminated last, which means they will occur first in the resulting vertex ordering. 

Vertex ordering from tree decomposition. However, in this section for technical reasons we also 
describe a specific realization of the basic GYO-elimination procedure. Given a tree decomposition (T, x) of 
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LL = (V,£) we construct an ordering of vertices v\,... ,v n as follows. First, we make (T, x) a reduced tree 
decomposition as described above. Second, we designate a node of T as the root node. For every vertex 
»eV, let t v denote the node highest in the tree such that v £ x(i„). We call the bag x(tv) the “owner” of 
of v, and v a “private vertex” of x(t u ). We construct a vertex ordering as follows: 

1. If T is empty, return the empty sequence. 

2. Otherwise, take any leaf node t of T, let P t denote the set of private vertices of x(t). (Because T is 
reduced, Pt is not empty!) 

3. Remove t from T, resulting in a tree T'. Let a' denote the vertex ordering of V — Pt obtained from T' 
by induction. Let a be obtained by appending vertices in P t to the end of a'. (Note that T' still has 
the same root as T.) 

The resulting vertex ordering is called a vertex ordering induced by the tree decomposition. The following 
follows by induction. 

Proposition C.3. Let a = (iq,..., v n ) be a vertex ordering induced by a tree decomposition (T, x), then 
Uf is a subset of some bag ofT, for every k £ [n]. Furthermore, the vertex ordering can be constructed in 
polynomial time. 

Now we are ready to characterize acyclicity and widths using vertex orderings. 

Proof of Proposition 4.9. For the forward direction, assume LL is a-acyclic. Then, there is a tree decompo¬ 
sition (T, x) of LL in which every bag is a hyperedge. Let er = (iq,..., v n ) be an induced vertex ordering of 
(T, %). By Proposition C.3, U% C y(t) for some t £ V{T). Since xif) is a hyperedge of LL , x(t) € £k and 
hence U% = x(t) € d a (vk). 

Conversely, suppose there is a vertex ordering a for which Uf £ d a (vk ) for all k £ [ra]. By induction, 
LLf-i is a-acyclic. Thus, there is a tree decomposition T n _i in which every bag of Xk-i is a hyperedge of 
Lin- 1 - There is a hyperedge B = U n — {u„,} in LL n -1 which may not be a hyperedge of LL n . If B is indeed 
not a hyperedge of H n = LL, then we replace B by U n , add a bag for each set in d(v n ) — {U n }, and connect 
those bags to the XJ n - bag. If B is already a hyperedge of LI, then we create a new bag B' = BU{v n }, connect 
it to B , and connect the remaining bags in d[v n ) to B' as before. Either way, we have just created a tree 
decomposition of LL in which every bag is a hyperedge and vice versa. □ 

Proposition 4.10 was shown in [65]; we reproduce a stand-alone proof here for completeness. To prove 
Proposition 4.10, we use a different characterization of ^-acyclicity: 

Proposition C.4 (See [36]). A hypergraph LL = (V,£) is /3-acyclic if and only if there is no sequence 

(P\ , n\, P‘2 , 1l2, ■ ■ * , Fm, nm , F m - 1_1 — F\ ) 

with the following properties 

• m > 3 

• iti,..., u m are distinct vertices of LL 

• .Fj,..., F m are distinct hyperedges of LL 

• for every i £ [m], ut £ Fj D Fj+i, and Ui Fj for every j £ [m + 1] — {*,* + 1}. 
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Proof of Proposition f.10. For the forward direction, suppose TL is /3-acyclic. A nest point of TL is a vertex 
v G TL such that the collection of hyperedges containing v forms a nested sequence of subsets, one contained 
in the next. Any j3 -acyclic liypergraph TL has at least two nest points [19]. Let v n be a nest point of 
TL = TL n . Then, d{y n ) is an inclusion chain and U n is the bottom element of d(v n ). Consequently, TL n -i is 
precisely TL — {w n }, which is ^-acyclic. By induction there exists an elimination order tq,..., v n -\ such that 
every collection d{vk) forms a chain, k G [n — 1], Thus, the elimination order vi,...,v n satisfies the desired 
property. 

Conversely, suppose there exists an ordering a = (tq ,... ,v n ) of all vertices of TL such that every collection 
d a (vk) is a chain. Assume to the contrary that TL is not /3-acyclic. Then, there is a sequence 

(-Fj, U± , P‘2 , U 2 ) • • • ) Fm i n m 5 Pm -(-1 — Pi ) 

satisfying the conditions stated in Proposition C.4. Without loss of generality, suppose u m comes last in the 
vertex ordering a, and that u m = Vk for some k. Then, the poset d a (vk) contains the set F m n{» i,..., Vk-i} 
and the set F\ C {tq,..., iq_i}. Since both 112 and u m -1 come before u rn in the ordering, we have 

u 2 G {Fi C {vi,... ,u fc _i}) \ (F m (~l {vi,... ,V k -l}) 

^m— 1 G {F m n {tq, . . . ,Vk-l}) \ (Fi n {tq, . . . ,Vk-i}) . 

Consequently, d a (vk ) is not a chain. □ 

Proof of Lemma f.12. For the forward direction, consider a tree decomposition (T, y) with g -width w and 
one of its induced vertex orderings a = tq,..., v n . By Proposition C.3, every set U£ is a subset of some bag 
X{t), t G V{T). Thus, due to g’s monotonicity, g{Uk ) < s(x(t)) < w. 

Conversely, suppose there is a vertex ordering er = tq,..., v n for which g{U£) < w for all k G [n]. Let T 
be a tree decomposition induced by this ordering. Then, by Proposition C.2, every bag in T is some set C4, 
which completes the proof. □ 

D Missing details from the analysis of InsideOut 

Proof of Theorem 5.1. Consider an FAQ-SS query ip with free variables X^j. As discussed, Outsideln is 
basically LeapFrog Triejoin [89]: (1) it finds - by backtracking search - all tuples xr„] for which ifsi'X-s) 7 ^ 0 
for all S G £. For each found tuple, the algorithm adds the product s^s^S^s) to the entry 

Hence, the overall runtime is dominated by the backtracking search itself, which can be bounded by 0{mn ■ 
AGM(V) - log At) (see [69]). □ 

Proof of Proposition 5.9. For the sake of brevity, define w = max^gif p^{Uj). We use the same analysis 
as that in the proof of Theorem 5.5. However, we further bound ||V’s|| by N w and also replace all the 
AGM-Hj. (Uk) terms by N w . Note that every factor tfs is either an input factor {S G £) or an intermediate 
factor (S = Uk — {k} for some k ). 

• If tps was an input factor then ||^s|| < N < N w because w > I. If ips is an intermediate factor 
{S = Uk — {fc}), then with exactly the same reasoning as in the proof of Theorem 5.1, ||i/ ; s|| can be 
bounded by AGM-h(I 4) which is upperbounded by N w . 

• Next, for k G K , we show that the computation time of 'ifu k -.{k} using Outsideln can be bounded by 
0{mnAGM-u(Uk) log N) instead of 0{mnAGMu k {Uk) log Af ). 19 Since AGM-h(I4) < N w , we would be 
done. This simple but subtle fact is best explained using the language of database join. For each 
k G [n] and 5 G define a relation 

R(4>s) = f {xs | V’s(xs) ^ 0 }- 

19 Note that the bound AGM -H k (Uk) is computed from the best fractional edge cover of Uk using hyperedges Sk , which - 
compared to £ - has additional intermediate hyperedges/relations and also lack hyperedges/relations from E which belong to 
d(j) for j > k. 
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The runtime of Outsideln computing ij>u k - {&} is precisely the same runtime when we use a worst-case 
optimal join algorithm to compute the natural join 

Q = 1X1 s&£ k R(ips/u k )- 

snUk^Q 

The runtime bound AGM-^ k (Uk) is a bound on the maximum number of possible output tuples of Q. 
Now, consider the query 

Q' = M s e £ R(ips/u k )• 

S’n{/fc#0 

Then, it is easy to see that Q C Q' , and thus the maximum number of output tuples of Q is bounded 
by the maximum number of output tuples of Q', which is AGM-^(t4). 

□ 


E Quick applications of InsideOut 

We describe here some examples where expression (14) is “easy” to minimize. Note that to avoid cumbersome 
notation, up to re-indexing of variables, we have stated (14) with the assumption that the variable ordering 
of the input expression is from X\ to X n . In the examples below, the variable indices might be arranged 
differently from the natural 1 to n order. 

Example E.l (Matrix Chain Multiplication). Consider the matrix multiplication problem and its reduction 
to FAQ-SS described in Example 1.1. In this problem, the set of free variables is F = {l,n+l}. Let 
V 2 , ■ ■ ■, v n be an arbitrary permutation of {2,..., n}, and let iq = 1, v n +i = n + 1. Then, the problem can 
be re-stated as computing the function 


n 

<fi(x l,x n+1 ) = EE-E nvw(^ i+l) . 

Xfy^ 31^2 ^^tl ^—1 


And, expression (14) becomes 


log V ■ O \U k \ ■ \{S e £ : Sn U k + 0}| • ACM Hh (U k ) + np 1 p n+1 J . (38) 

Let us see how one can find a variable ordering V 2 , ■ ■ ■ ,v n to minimize this expression. 

Let i = v n G {2,... , n}. Then, for this variable ordering U n = {i — 1,£,£ + 1}. In fact, it is easy to see 
that every set Uk will have size 3 during the elimination process; and the number of hyperedges intersecting 
Uk is at most 4. To fractionally cover U n , we can bound 

< pepe+i 

\’^e- 2 ,e-i/u„\ < Pi-i- 

Hence, we can bound the term \U n \ ■ |{5 e £ : SnU n ^ 0}| ■ AGM u n {U n ) in (38) to be 12 -pe-i ■pe -pi+i- This 
is exactly the usual rough estimate of the cost of multiplying a pe-i x p^-matrix with a pi x p^+i-matrix. 

Inductively, finding the best variable ordering to minimize (38) is exactly the same as finding the best 
sequence of matrix multiplications to minimize the overall multiplication cost as set up in the textbook 
Matrix Chain Multiplication problem [28]. This problem can be solved by dynamic programming! 

Example E.2 (Matrix Vector Multiplication for structured matrices). We begin with the DFT matrix. 
Recall that we assume that n = p m for some constant prime p. Using the bound in (38), one can show that 
InsideOut runs in time 0(m 4 p m ) = 0(n log 4 n), which is a log 3 n factor off from the O(nlogn) runtime of 
FFT. Next, we will show that our framework as a special case contains the message passing interpretation 
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of the FFT algorithm of Aji and McEliece [7,8]. In particular, we will need to do some pre-processing on 
the input factors before running InsideOut. For every 0 < k < m, define 

V’-.nOo , . . . , X-ui—k— 1 1 yk)= n 

0 <j-\-k<m 


Noting that the entire truth table representation of ip.,Y k can be computed in time 0(mp m ~~ k ) and hence 
one can compute all the truth table of ijj. t Y k for k £ [m] in time 0(inp m ) = 0(nlogn). Note that if we have 
these factors pre-computed, then we want to compute the following FAQ-SS instance 

m— 1 

X] ■ JJ ^;Y k (xo,...,x m - k -i,y k ). 

(vo,---,vm- i)ez™ k =o 

Now consider the variable ordering Y m _i,... ,lo and note that Uk when eliminating Yj, (note we changed 
the notation from the default one), is given by Uk = {X 0 ,..., X m _fc, F 0 ,..., Yk}. We further note that since 
D = {0,1} we do not need to use the indicator projections in InsideOut. In particular, this implies that 
line 7 in Algorithm 1 is computing 

^2 • V’c/ fc+ i, 

Yfc EZ P 

which can be done in 0(jp m ) time. Since the above line is executed m times, the overall run time is 
0(mp m ) = 0(n logn), as desired. 

For the circulant matrix, we note that if one picks the variable ordering 

5 Z-m— !;•••) Xq, Y m — \,.. - , 10 


then Algorithm 1 with this variable ordering corresponds to computing C • b as computing F ■ b first, then 
computing F • c next, then computing their component-wise product u = (F • c) • (F ■ b) and then finally 
computing F” 1 • u. Note that each of the DFT computations we just saw is done in O(nlogn) leading to 
an overall runtime of O(nlogn) to compute C • b. 

Next, we consider the case when A is the Kronecker product D ® E. In this case for consider the 

variable ordering !j, Y 0 and the corresponding U\ = {Xi , lb, 1) } and U 0 = {X 0 ,Xi,Y 0 }. To fractionally 

AO) _ AO) 


cover Ui, set A|^ = A Xo,y 0 = 1- Similarly to cover Uq, set A x’o,Yo = = 1- Finally to cover 

F = {Xq,X\}, set px 1 ,Y 1 = I^x 0 ,y 0 = I - Then (38) gives an overall runtime bound of 0(uq) = 0(n 3 / 2 ). 

We defer the discussion on the Khatri-Rao product case and now consider the case of A = D o E. 
Now consider the run of Algorithm 1 with variable ordering Yo, Yj, Y], lb. The corresponding sets are Uq = 
{X 0 , X 2 , Y 0 , Y u Y 2 , Y 3 |, U 2 = {X 0 , X 2 , Y L ,Y 2 , F 3 }, t/i = {X 0 , X 1; X 2 , X 3 , W, F 3 } and U 3 = {X 0 , X 1; X 2 , X 3 , Y 3 }- 
We note that it is enough to cover Uq and U\ since U 2 C Uq and t/ 3 C U\. To cover Uq, set A^ Yo y 2 = 

X xIx 3 ,y u y 3 = 1 and t0 cover U i set X< x 0 ,x 2 ,y 0 ,y 2 = X xIx 3 ,y u y 3 = 1 - Finally, to cover F = {X 0 , X 1; X 2 , X 3 }, 
set I-1x 0 .x 2 .y 0 .y 2 = fix 1 .x 3 ,Y 1 ,Y 3 = 1. Then (38) givens an overall runtime bound of 0(n ®) = 0(?z 3 / 2 ). 

Finally, we consider the case when A is the Khatri-Rao product A = D * E. Now consider the run of 
Algorithm 1 with variable ordering Y 0 ,Yi,Y 2 . The corresponding sets are Uq = {X 0 , X 2 , Yb, Yi, Y 2 }, U\ = 
{X 0 , Xi, X 2 , Yi, Y 2 } and U 2 = {X 0 , X 3 , X 2 , Y 2 }. Since U 2 C Lb, we only need to cover Uq and U\. However, if 
we just try and use an edge cover and then (38), then we will get the trivial quadratic runtime. However, like 
in the case of the DFT, we can consider the InsideOut algorithm and argue an overall runtime of 0(n 5//3 ). 
Since all the factors are represented using the truth table representation, Algorithm 1 would run in the 
same time even if we added in the projection of the factor ipx 0 ,Y 0 ,x 2 ,Y 2 onto {Xq, Yq} and the projection of 


'tpx 1 ,Y 1 ,x 2 ,Y 2 on to {X 3 , Yi}. With these extra factors, we can cover Uq with 


(o) 


,X 2 .Y n ,Y 2 


= A 


( 0 ) 

x,,y, 


= 1, t/i 


with A 


(i) 


x 1 ,x 2 ,y 1 ,y 2 = x x 0 ,Y 0 = 1 and F = {X 0 ,Xi,X 2 } by px 0 ,Y 0 = Hx 2 ,x 2 ,y u y 2 = 1- Then (38) implies an 


overall runtime of 0(n q) = 0(n 5//3 ), as desired. 
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F More on characterizing EVO (ip) and approximating faqw(<£>) 

F.l Quick applications of FAQ-SS without free variables (i.e. SumProd) 

FAQ-SS without free variables is the easiest special case of FAQ because all variable orderings are (^-equivalent. 
In this case, every variable aggregate is a semiring aggregate. Hence, K = [n]. Expression (15) becomes a 
simple quantity log N ■ O (mn 2 JV faqw ^^ + /(/ + m)||<d|), where 

faqw(cr) = maxpb(t4). 
fce[n] 

In this case, faqw(cr) is exactly the induced fractional edge cover width of er (Definition 4.11). Thanks to 
Lemma 4.12, it follows that 

faqw(yj) = fhtw(7t). 

However, computing (a tree decomposition with) the optimal fhtw is NP-hard [37]. From the approxi¬ 
mation algorithm of Marx [60] we obtain the following corollaries, which are essentially what was shown in 
Grohe and Marx [60] applied to a wider context. 

Corollary F.l (Grohe-Marx [47,60]). FAQ-SS without free variables on any semiring can be solved in time 

d(lV 0 ( fh tw 3 («)) + ii^ii^ 


where ||^|| is the time needed to report the output (and under the assumption that fhtw(%) < c for some 
constant c 20 ). 

In most of our applications, ||</?|| = 0(1); however, as we have seen earlier in Example A. 17 and shall visit 
again below for the natural join problem, ||<^|| can be a lot larger than j\r fhtw C ? t). We first describe several 
applications where ||(/?|| = 0(1). 

Corollary F.2 (Theorem 1 from [79]). Quantifier-free #CQ is solvable in time 0(A^ 0(fhtw3 ^^) (under the 
assumption that fhtw(H) < c for some constant c). In particular, jfCQ is tractable for the class of conjunctive 
queries with bounded fractional hypertree width. 

The following result in the probabilistic graphical model context might have been known; but we could 
not find a paper or textbook that proves it. (Note that fV fhtw ^^ < if each input domain is of size 

D. And, jV fhtw (' H ) can be arbitrarily smaller than ]j tw (' H )+ 1 for sparse inputs.) 

Corollary F.3 (Partition function in PGM). The partition function in probabilistic graphical models can be 
computed in time 0(N°^ hhN ^) (under the assumption that fhtw(H) < c for some constant c). 

For the natural join problem, there is the issue of output reporting. We next work out what InsideOut 
does to the set semiring formulation of the natural join problem (Example A.17), and briefly analyze the time 
it takes to report the output under this semiring. We prove here the 0(!V fhtw + ||(^|| (-runtime of InsideOut 
for computing joins as shown in Table 1. 

Example F.4 (InsideOut for joins under the set semiring). To detail how InsideOut works with the (2 W , U, fl) 
semiring, we describe its behavior using the familiar relational algebra notations [1]. For this problem, the 
input factors ifs are the input relations Rs ; the intermediate factors are materialized intermediate relations; 
the indicator projections are precisely the relational algebra projections. Following (7), the first intermediate 
factor we want to compute is 

Ru n = ( lxl sea(n) R-s) 1x1 ^u n {Rs)) 

Then, we would like to “marginalize out” X n under the U operator of the set semiring. But we certainly 
do not want to perform set-union brute-forcely because each tuple t jj n is mapped to D n ~\ Un \ points in the 

20 This assumption is inherited from Marx approximation algorithm [60]. 
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output space U. Luckily, all we have to do is to store Rjj n using a B-tree or trie data structure in an attribute 
order such that X n comes last. This way, if we only traverse the trie on the first \U n \ — 1 attributes, we 
will have access to exactly the tuples in R Un -{x n }- Furthermore, if later we want to also visit the tuples in 
Ru n we can get down one more level in the trie. Thus, a trie-like index that respects the variable ordering 
implicitly computes the marginalization operation for us! 

The next issue is how much time it takes to report the output. Since our representation of the output set 
is implicit, the set Pjq is an implicit representation of the output. Up until the point of computing Rjj 1 we 
have only spent O(N faqw (‘0 time, but the total number of output tuples might be a lot larger than that. 21 
In order to actually report all output tuples, we will go through each x\ £ Run then each xi such that 
( X\,X 2 ) aligns with Rjj 2 : and so forth, until Ru n - This is a series of semi-join reductions in disguise; and 
it roughly corresponds to the second and the third phases of Yannakakis algorithm [90] performed at once. 
The total amount of time to report all output tuples is 0(n||y>|| log IV), where ||c^|| is the number of output 
tuples. 

Corollary F.5 (Relational join). Given an optimal variable ordering, the InsideOut algorithm can solve the 
natural join query in time 

O^fhMTO + II^H) 

where ||y>|| is the output size. Moreover, if an optimal ordering is not given and assuming that fhtw(7t) < c 
for some constant c, the runtime becomes 


0(iV°(fhtw 3 («)) + II^IQ. 


The above discussion assumes that we are presenting the output in the listing representations. There are 
other options which Section 8.4 discusses. 

F.2 Variable ordering for FAQ with two blocks of semiring aggregates 

This section highlights some of the subtle problems that arise when there are more than one variable aggre¬ 
gate. In particular, we consider the following instance of the FAQ problem where there are no free variables, 

<P = ©x w _ L V's(xs). (39) 

The input FAQ query has two blocks of variable aggregates. In the first block, for variables X,; where i € L, 
0 is a semiring aggregate; and the second block, for variables X,, i L , has 0 as a functionally different 
semiring aggregate. Example A. 18 showed that #CQ can be reduced to an FAQ-instance of this form, where 
0 = 'fT, and 0 = max. Note that 0 and 0 being functionally different means they are not commutative, 
as we showed in Proposition 6.6. 

F.2.1 The precedence poset 

Define a poset called the precedence poset P = ([n], A) over the variables as follows. Let u -< v for every pair 
(u,v) in the same connected component of R such that u £ L and v (f L. Let LinEx(P) denote the set of 
linear extensions of the precedence poset P. We first show that LinEx(P) is “sound”. 

Proposition F.6 (LinEx(P) C EVO(y>)). Suppose 0 and 0 are not identical in the domain D. Given a 
query <p with only two blocks of variable aggregates as written in (39), we have LinEx(P) C EVO(</?). 

21 Consider the star join query <f> = cYQj 1 Rj{X, Yj). Since O is acyclic, fhtwh'j) = 1 but it is easy to come up with instances 
of the relations/factors such that the output size is as large as TV 71-1 (consider e.g. the case when each Ri = {1} X [TV]). 
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Proof. Suppose P has only one connected component. Then, every linear extension of the poset P has 
variables in L listed before variables in [n] — L; the soundness of LinEx(P) thus follows. If P has multiple 
connected components, then expression (39) factorizes into a product over those connected components, 
where each factor in this product is an FAQ-expression over the variables in that connected component. So 
we are back to the case when there is only one connected component. This proves that LinEx(P) is sound. □ 

In Section 6.1, we prove a more general result stating that every (^-equivalent variable ordering a has the 
same faqw as some n £ LinEx((/>). In that sense, LinEx(P) is also “complete” and we only need to look in 
LinEx(P) to find a a minimizing faqw(cr). In particular, we will show that 

faqw(</>) = min faqw(cr). (40) 

crELinEx(P) 

We next explain why the precedence poset and its linear extensions help find a good variable ordering. In 
this case, since there is no free variable, expression (15) becomes simply O (log N ■ mn 2 iV faqw ( cr )). However, 
it is no longer clear how faqw(</>) is related to fhtw(P). Here are some simple properties that we can observe 
straightaway: 

• faqw((/>) > fhtw(P), for any L C [n]. 

• If L = [n] or L = 0, then faqw(</?) = fhtw(P). 

• When 0 C L C [n], the FAQ-width faqw((/>) can be arbitrarily far from fhtw(P). For example, consider 
the star graph V = {1,..., n} with edges (1, n), (2, n), ■ • • , (n — 1, n). In this case, fhtw(P) = 1 because 
the graph is acyclic. Now, let L = [n — 1], then every (^-equivalent variable ordering has n as the last 
vertex, which means faqw(er) = n — 1 for every (/^-equivalent variable ordering. 

F.2.2 Relation to L-star size 

Before proving our main result for this section, we analyze how the FAQ-width faqw((/>) is related to the 
notion of width that was defined in Durant and Mengel [34] for dealing with the #CQ problem. Durant and 
Mengel [34] introduced the notion of L-star size of a hypergraph to characterize the complexity of counting 
solutions to conjunctive queries. 

Let J- be any set of hyperedges and B be any set of vertices. Then, the independence number ajr(B) is 
the maximum size of a set /CP satisfying the following conditions: (1) / C (2) no two vertices 

from I belong to the same hyperedge in T. 

Definition F.7 (i-star size). Let P = (V,£) be a hypergraph, and L be a subset of vertices. For any 
connected component C of Ti — L with vertex set V(C) and edge set £{C), define 

£{C) = {S G £ | S n V(C) ^ 0} 

U{C) = Lnl (J 5 j . 

\SGS(C) J 

Then, the L-star size of P, denoted by L- ss(P), is the maximum independence number ot£/ c \(U(C)) over 
all connected components C of P — L, i.e. 

L-ss(P) = max {ag( C )(U(C)) \ C is a connected component of P — L} 


(41) 

(42) 


Note that in the above definition £(C) is not necessarily the same as £(C). The edges in £(C ) do not 
contain any member of L , in particular. And, the edges of £(C) are the original edges of P. To relate L-ss 
to faqw, we need a simple technical lemma. 
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Lemma F.8. Let (T, \) be a tree decomposition of a hypergraph TL = (V, £)> where the fractional hypertree 
width of (T,x) is w. Let L C V be an arbitrary set of vertices ofTL. Then, there exists a tree decomposition 
(T', \') of TL with the same fractional hypertree width satisfying the following condition. Let V(C'i) and 
V(C 2 ) be the vertex sets of two different connected components ofTL — L. Then, there is no bag in (T', \') 
intersecting both V(C'i) and V(C 2 ). 

Proof. We modify (T, %) gradually to satisfy the required condition without increasing p^(B) for any bag B 
of the tree decomposition. Fix an arbitrary root r of (T, \). While there still are two connected components 
C\ and C 2 for which the desired condition is violated, we do the following. 

Let B be the bag farthest from the root r such that B intersects both V(Ci) and V(C 2 ). In particular, 
every child of B intersects at most one of the two sets V(Cl) and V(C 2 ). Now, we split B into two bags 
B\ = B \ V(C 2 ), and £? 2 = B \ V(C'i). Remove B and connect both B\ and _B 2 to the parent of B. (If B is 
already the root, then we connect B 1 and £? 2 with an edge.) Children of B which intersect V(C'i) are connected 
to Bi , for i £ {1, 2}. Children which do not intersect neither V(Ci) are connected arbitrarily to B\ or £? 2 . It 
is straightforward to verify that we still have a tree decomposition for Tt and that p^(Bi) < p^(B). □ 

Theorem F.9. Let tp be the FAQ-query defined in (39). For any hypergraph TL = (V,£) and any subset L 
of vertices of TL, we have 

(i) faqw(cp) < (1 + L-ss(TL)) ■ fhtw(H). 

(ii) Furthermore , there is a class of hypergraphs TL and sets L for which faqw(yj) is bounded but L-ss(TL) is 
unbounded. 

Proof. To prove (i), we show that there exists a (^-equivalent vertex ordering a such that faqw(cr) < (1 + 
L-ss(TL)) • fhtw (TL). We do so by constructing a tree decomposition (T,%) of TL for which the fractional 
hypertree width of (T, %) is at most (1 + L-ss(TL)) ■ fhtw (TL), and for which there exists a GYO-elimination 
order where vertices in L are eliminated last. 

The tree decomposition (T, y) is constructed as follows. Let (Tl,Xl) be a tree decomposition of TL with 
fractional hypertree width equal to fhtw("H). From Lemma F.8 we can assume that every bag of ( Tl,xl ) 
intersects at most one set V(C) for each connected component C of TL — L. A bag that intersects V(C) is 
called a C-bag. Then, adapting an argument in [34], for every C-bag XiM) (f € Tif), we amend x.l( t) with 

XL(t) = (xL(t)UU(C))\V(C), 

where U(C) is defined as in Definition F.7. After this step is done, ( Tl,xl) is a tree decomposition of the 
restriction of TL on L. This is because the collection of all C-bags form a connected subtree of (Tl,xl) 
before the amendments. 

Next, for each connected component C of TL — L, we construct a tree decomposition ( Tq,Xc) with 
fractional hypertree width at most fhtw(%). (This is possible because the fhtw of a subgraph is at most the 
fhtw of the supergraph.) Then, for each bag xc(f) of (' Tc,Xc ) we set 

Xc(t) = Xc(t) U U(C). 

Finally, we construct the tree decomposition (T,x) by connecting the tree decompositions ( Tc,Xc ) to the 
tree decomposition (Tl,xl)- This is done by connecting an arbitrary bag of (Tc,Xc) to a bag of ( Tl,xl ) 
that used to be a C-bag. 

We claim the following: 

Claim 1. (T,x) is indeed a tree decomposition of TL. 

Claim 2. (T,x) has fractional hypertree width at most fhtw('H) • (1 + L-ss(TL)). 

Assuming the claims hold, let us complete the proof of part (i) of the theorem. We construct the L- 
prefixed variable ordering by showing that we can eliminate vertices from V — L before any vertex in L using 
the GYO-elimination procedure. Fix a connected component C of TL — L- then vertices in V(C) all reside in 
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the subtree ( Tq,xc )• We now run the GYO-elimination procedure on this subtree up to its root r, which 
was the node that was connected to the center tree ( Tl,xl)- In this process, all vertices in 17(C) are not 
eliminated since the neighbor of r contains 17(C). We inductively eliminated other V(C') for other connected 
components C' of H — L. Part (i) of the theorem is thus proved. 

We next prove Claim 1 . Every hyperedge S £ £ is either completely contained in L , or S H V(C) 7 ^ 0 
for some C. If S' C L, then S’ is a subset of some bag in ( Tl,\l ), which means it is contained in some 
bag of the final tree decomposition (T,x)- If S fl V(C) 7 ^ 0, then S fl V(C) is contained in some bag of the 
original tree decomposition (! Tc,Xc )• And so S is contained in some bag of {Tc,xc) after we amend each 
of those bags with the set 17(C). To verify the running intersection property (RIP), fix a vertex v £ V. If 
v £ V(C) for some connected component C of J-L — L, then RIP holds for v even after the amendments of 
bags in ( Tq,Xc )• If v £ L then its RIP property holds also because ( Tl,Xl) is a tree decomposition too. 

Finally, we prove Claim 2. The crucial point to notice is that each bag is amended at most once with 
17(C) for some C. This is because a C-bag is not a C'-bag for two different connected components C and 
C', thanks to Lemma F. 8 . From Lemma 19 of Durand and Mengel [35], each set 17(C) can be covered by at 
most L-ss(T~L) many C-bags. In particular, for each amended bag Xc(t) = Xcif) U 17(C) we have 

Pn(xc(t)) < fhtw(W) + ^(17(C)) 

< fhtw('H) + L-ss(T~L) • fhtw(H) 

= (1 + L-ss(H)) ■ fhtw {%). 

Similarly, each amended bag Xl (t) is also fractionally covered by at most the same quantity, which completes 
the proof of the claim. 

To prove part (ii) of the theorem, consider the hypergraph TL = (V,£) where 

[n + 1} 

{M.{l,ra+l},{ 2 ,n+l},-- - ,{n,n+l}} 

N- 

For this graph, faqw(y>) = 1 + (n — l)/n = 2 — 1/n < 2, and L-ss(TL) = n. □ 

F.2.3 Approximating faqw(</?) 

We now prove a lowerbound on faqw(y>) that leads to an approximation algorithm for computing faqw(<p) 
using an approximation algorithm for fhtw(7t) as a blackbox. Recall that we are still considering the FAQ- 
query if of the special form (39). Some of the ideas developed in this section will lead to the approximation 
algorithm for faqw(tp) for the general FAQ-query case. 

Let H = (V,£) be a hypergraph and L C V be a subset of vertices. For each connected component C of 
H — L, let the sets £{C) and 17(C) be as in Definition F.7. Define the hypergraph Hl = {L , £l) where 

£l = {S | S££/\SCL}U{U (C) I C is a connected component of TL — L} . (43) 

Lemma F.10. For any connected component C of TL — L we have 

faqw (if) > fhtw(C) 
faqw(yj) > p* H {U(C)). 

(Note that C itself is a hypergraph.) Furthermore, 

faqw(yj) > fhtw (Hl)- 

Proof. For any connected component C of P. — L, it is easy to see that 

faqw(yj) > fhtw(TI) > fhtw(C). 


V = 
£ = 
L = 
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Let a = {vi,... ,v n ) be an L-prefixed variable ordering of TL with faqw(cr) = faqw(y>). (Such a variable 
ordering exists due to (40).) For a connected component C of TL — L, let k be the smallest integer such that 
k > \L\ and v k G V(C). Then, the set Uk is precisely U(C) U {u*,}. (This is the reason why we chose the 
notation U(C).) To see this, note that each time we eliminate a vertex from C, we insert back a hyperedge 
interconnecting all its neighbors to the next hypergraph in the hypergraph sequence. And so by the time we 
reach Uk all of the neighbors of C in L are connected, which is the set U(C). It follows that 

faqw(c/?) = faqw(cr) > p^{U k ) > p^(U(C)). 


It remains to prove faqw(y>) > fhtw(Hi). From the above argument, for every connected component C 
of TL — L, the set U(C) is a hyperedge of which is precisely the graph TLl defined above. We thus have 


faqw(yi) = 
> 
> 


min (max{p^([/£) | 1 < k < n}) 

crGLinEx(P) 

mm, . (max{p^(C/^) | 1 <k< |L|}) 

o-eLinEx(P) 

nun (max {pu L {U k ) I 1 < k < |L|}) 
fhtw(’Hi), 


where r is a variable ordering of the variables in L (as opposed to a which was a variable ordering of [n]). □ 

Theorem F.ll. Let p be any FAQ query of the form of (39) whose hypergraph is TL. Suppose there is an 
approximation algorithm that, given any hypergraph TL ', outputs a tree decomposition of Li' with fractional 
hypertree width at most g(fhtw('H / )) in time t(\H'\, fhtw(7T)) for some non-decreasing functions g,t. Then, 
we can in time \TL\ ■ t{\TL\, faqw(</?)) compute a (p-equivalent vertex ordering cr such that 


faqw(cr) < faqw(y>) + g(faqw(y>)). 

Proof. We use the blackbox approximation algorithm to construct a tree decomposition (Tq. Xc) for every 
connected component C of TL — L and also construct a tree decomposition ( Tl,xl ) for Hl = ( L,Sl )■ 
(Recall that £l was defined in (43).) Then, we form a tree decomposition (T,x) by connecting these tree 
decompositions together in the following way. We add the set U(C) to each bag of of the tree decomposition 
( Tc,Xc ), and arbitrarily connect any node of the tree ( Tq,Xc) to a bag B of the tree ( Tl,xl ) for which 
U{C) C B. 

From Lemma F.10 and the fact that g is non-decreasing, to cover any bag of the sub-tree ( Tl,Xl ) we 
need a fractional cover number of at most 


3(fhtw(H L )) < 5 (faqw(y>)). 

To cover any bag of the sub-tree ( Tc,XC) (after U(C) is added), we need a fractional cover number of at 
most 

Pn(U(C)) + 5 (fhtw(C)) < faqw(yj) + g(faqw(y>)). 

Finally, we obtain a by running the GYO-elimination procedure on the combined tree decomposition (T, x), 
making sure that variables in L are eliminated last. □ 

By applying the above theorem using the fractional hypertree width approximation algorithm from 
Marx [60], we obtain the following corollaries. 

Corollary F.12. Let tp be any FAQ query of the form of (39). Suppose faqw (tp) < c for some constant c 22 . 
Then, in polynomial time we can compute a ip-equivalent vertex ordering a such that faqw(cr) = 0(faqw(y;) 3 ). 
In particular, we can solve the FAQ problem of the form (39) in time 

O f 7V°( faqw3 ^) 


22 This assumption is inherited from Marx approximation algorithm [60]. 
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Corollary F.13. Assuming faqw(i p) < c for some constant c, the ffCQ problem is solvable in time 

O ^7V°( faqw3(v) ) 

In particular, jfCQ is tractable for the class of conjunctive queries for which faqw(<p) is bounded. 

Note that due to Theorem F.9, the above corollary implies the result by Durant and Mengel [34] that 
#CQ is solvable in time 0{N P°lyH-ss(«),fhtw(H))^ Furthermore, since as we have observed there are classes 
of graphs for which faqw(<p) <C L-ss(fH), our result is strictly stronger. 

G Factor representations 

G.l More on the factor oracle 

We argue that the listing representation (Definition 4.1) satisfies both Conditional query assumption and 
Product-marginalization assumption as long as we choose an appropriate data structure to support listing. 
Furthermore, answering these queries from the data structure takes only at most 0(1) time. We will assume 
that the representation can depend on a given ordering cr among the variables (we will see shortly why this 
latter requirement is needed). 

Assume the non-0 elements of a factor ifs are stored in a B-Tree data structure that respects cr. In 
particular, we will store the tuples z such that ips( z ) / 0 as follows. Except for the root node, all the 
nodes in one level correspond to the same variable, and the ordering of the variables (when sorted from the 
level closest to the root to the farthest) is consistent with cr. In other words, the children of the root node 
correspond to all X\ such that ifs(' I aq) ^ 0 and are labeled with the corresponding value of x\. (Here we 
assume that 1 comes first in cr.) Further, these children are sorted in (say) increasing order of the x\ values. 
Then for each child of such an X\ we build the tree recursively for ips{' I ah)- Finally, the leaves (which 
correspond to a vector z formed by concatenating the labels on the unique path from the leaf to the root) 
also store ifs( z): we will call this the value of the leaf. Note that it has to be the case that ips( z ) ^ 0. 

We now argue why this listing representation satisfies the Conditional query assumption. Assume WLOG 
that cr = (1,2,... ,n). Then given xj^j, we go down the path in the B-Tree labeled with x^j. Say u is the 
last node on this path. Then computing Xk+i basically corresponds to figuring out where y lies in the sorted 
list of values of all children of u. Thus, with a binary search we can return the desired result. Note that if 
the query did not respect the ordering cr, then this would not have been possible. 

We now argue why the listing representation satisfies the Product-marginalization assumption. Assume 
that under the ordering cr, i comes last in S. Then one can perform product marginalization given i and 
the B-Tree representation of ifs as follows. Go through all leaves of ifs and multiply the values of all leaves 
with the same parent, (pretend to) throw away the leaves and store the computed product as the value of 
the new leaf just constructed. (We don’t really throw the leaves away because marginalizing just means we 
pretend the depth of the trie/B-tree is one less.) Again note that we need the query to respect the ordering 
cr. 

In all our algorithms, we will have the case that all the queries respect the ordering cr of the B-Trees. 
This is because the algorithm is given as an input a variable ordering cr and all its queries respect this 
ordering. Then in essentially linear time one can construct the B-Tree representation of all factors with a 
simple pre-processing step. (There is one caveat which is the indicator projections of the input factors; but 
those can also be pre-processed in the same amount of time.) 

G.2 Truth table representation 

In the truth-table representation , each input factor tjs is represented using a table of IFes |Dom(Aj)| many 
rows. Each row lists the parameters and the value of the function. The value can be implicit as in the set 
semiring. 
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One example is the (dense) matrix multiplication problem where each input and output matrix is rep¬ 
resented by listing out all of its entries. Another example is text-book description of PGM inference, where 
conditional probability tables are often assumed to use the truth-table representation. 

The truth-table representation makes problems easier because the input sizes are larger. However, it 
should be obvious that this representation is wasteful when the input factors are sparse , in the sense that 
they might have many 0-valued entries. 

It is easy to see that truth table representation can be converted to the listing format in linear time. 

G.3 Succinct Representation 

This section considers four specific representations of factors where the goal is to (i) have a more succinct 
representation of the factors than the listing representation and (ii) have an effective representation on 
which one can run efficient algorithms without having to “unpack” the succinct representation (into the 
listing representation). Our main aim will be to explain how these representations are essentially encodings 
of each input factor as an output of another FAQ (or FAQ-SS) instance. In particular, we will use our notation 
as opposed to those used in existing work. 

GDNFs. We start with the generalized disjunctive normal form (GDNF) for CSPs that was first considered 
by Chen and Grohe [25]. Recall that in the CSP problem (Example A.4) the corresponding FAQ-SS instance 
is to compute 

= V A ^s(xs). 

x see 

For the case when each of the ips is presented in the listing representation, we get back the Boolean con¬ 
junctive query. In the GDNF format for each S £ £, we have 

ms 

^s(xs) = V A A J> (x{ii)- 

i=lj€S 

For each (i,j) £ [mg] x S, ipg’^ is presented in the listing representation. We note two things: (i) the 
GDNF representation can bring down the representation size from J[[ ;gS |Dom(AGy)| (potentially) to Opmg • 
|Dom(Xj)|), which can be big savings and (ii) the effective FAQ problem that we need to solve is 


ms 

<p = V A V A 4 ,J) ( x h})- 

x se£i=ijes 


Decision Diagram Representation. We now consider the decision diagram representation of Chen and 
Grohe [25], which is a generalization of the well-studied ordered binary decision diagrams or OBDDs. Again 
we consider the CSP problem: 

= V A ^(xs). 

x see 

In the decision diagram representation each ipg is represented as follows. For brevity we consider S = 

{!,••■> 4 : 

S 

i>s(x 1,...,X S )= \J 

(yo,yi,---,ys) * =1 

where for each S and i, ip g^ is represented in the listing format. We make three remarks: (i) An equivalent 
way to think about the above representation (which is how it is presented in [25] and makes the connection 
to OBDDs more apparent) is the following. All the tuples x = (x \,..., x s ) such that ips(~x) = 1 corre¬ 
spond to the sequence of labels on all paths of length s in the following layered graph (with s + 1 layers): 
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i/)g\yi-i,Xi,yi) = 1 if and only is there is an edge from vertex ?/j_i (in layer i — 1) to vertex yi (in layer 
i) that is labeled xp, (ii) It is easy to see that a GDNF representation can be converted into a decision 
diagram of essentially the same size and it is shown in [25] that the decision diagram representation can be 
exponentially smaller than any equivalent GDNF representation; (iii) Note that the effective FAQ problem 
that we need to solve is 

|S| 

<p = V A V A4 Z) (yf- 1 > > yf )» 

x Se£ (Vo’Vi i=1 

where S[l], 5[2],..., S[|S|] is some ordering of elements in S. 

Factorized Databases. Olteanu and Zavodny [73] considered factorized representations for conjunctive 
queries (Example A. 5). Recall that we consider the following instance of FAQ-SS: 

V?(x f ) = \J /\ ips(x-s)- 

X[ n ]_ F Sg£ 


In the factorized representation (called f-representation in [73]), each ips is represented recursively as follows. 
In the general case either 

l 

i’si*) = V ^’’( x ) ( 44 ) 

2—1 

or 

k 

V>s(x) = A ^s ( ’*(xSi), (45) 

i= 1 

where Si ,..., Sk is a partition of S. In the base case we have factors of the form ipij\ : Dom(X,) —>■ {0,1}. 
We make five remarks: (i) An alternate formulation of /-representation is as follows. We can represent 
all the x such that ^g(x) = 1 in the following tree format. The leaves of the trees have single value for 
some attribute. Each of the internal nodes is either a ‘union’ node (corresponding to (44)) or an ‘Cartesian 
product’ node (corresponding to (45)) with the natural semantics attached to these nodes; (ii) It is easy to 
check that GDNFs are a special case of f-representations; (iii) A generalization of f-representation considered 
in [73], called d-representations is where one takes an f-representation and removes repeated sub-expressions: 
alternatively in the tree representation mentioned in (i), one ‘short-circuits’ sub-trees that are repeated 
resulting in a DAG; (iv) The main contribution of [73] is to show that if the factors are presented in a 
factorized representation, then the output of ip can also be represented in the same format (and they prove 
tight bounds on the output size): we briefly touch on compressing the output in Section 8.4; (v) One can think 
of ip with the factors in a factorized representation as a big FAQ instance (where each ips is represented by an 
FAQ instance itself over the Boolean semiring, where this instance is determined by the recursive definition 
of 'ips above). 

Fast Matrix Vector Multiplication. We saw in Example A.13 that for the DFT matrix F we can rep¬ 
resent the factor corresponding to the matrix more succinctly. We will now use the FAQ-SS formulation to 
describe a generic way to talk about matrices A for which we can get a sub-quadratic matrix vector multi¬ 
plication algorithm: further, our algorithm is the same. The only thing that changes is the representation 
of the A and the elimination order. Next, we provide more details on this claim. 

Recall that we are aiming to compute A ■ b. Assuming n = n™ for integers to, rig > 1, the most general 
way to represent an n x n matrix A = (Ajj) is via the following factor 

A. , . . . , X m - 1 , 2/0 5 ■ • * ) 2/m— 1) ^(xo, — (yo, > 
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where we think of (xo, ■ • ■, x m -i), (yo,..., y m - 1 ) as integers in the range [0, n — 1], Now assume that there 
is a hypergraph Ha = ({-X'oj ■ • •,-X’m-i) Fo, • ■ •, £a) such that 

i/ja(x 0 5 • • • > %m— 1 > 2/0) ■ • • j 2 /m — l) — J_ V*e(( x > y)e)j ( 46 ) 

eef A 


where we use (x, y) e as a short hand for the projection of {xo, ■ • ■, x m _i, yo,■ ■ ■, y m -i} onto e (i.e. if X,- £ e 
then we pick Xj and if Yj £ e then we pick y 3 ). With this notation in place, recall we are trying to solve the 
problem: 

<l>A-b(xo, ■ ■ ■ ,X m -i) = ^2 ■ JJ '0e(( x ,y)e)- (47) 

ee£A 

Next we present some specific instantiations of the FAQ-SS instance from (46). 

We begin with DFT and related matrices. In particular, recall the DFT matrix 


F = 



0<.x,y<n 


and from Example A.13 that in this case we have the hypergraph Hf = (Vf,£f) with 


v F = {x 0 ,...,x m _i,r 0 ,...,F ro _i}, 

£f = {(Fo,... , Fm-i)} U {{Xj,Yk)}o<j-\-k<m- 

Consider the n x n circulant matrix C where the first column is say the vector (c 0 ,..., c ra _i) and the 
rest of the columns are a cyclic shift of the previous column. It can be shown that C ■ b for a vector b is the 
same as the convolution of c = (co,..., c n - 1 ) and b. In other words, 

C • b = F -1 • (F • c) • (F • b). 

Note that the above is equivalent to the following FAQ-SS instance: 

^c-b(x) = 5 w c z- n i>Y j ,z k {yj’ z k) 

yEF™ zEF™ wGFJ 1 0 <j+k<m 

■ n i>Yj,w k {yj,wk)- n i’x j ,Y k (-x j ,y t c). 

0 <. 7 +fc<m 0 <j+k<m 

Next, we consider the case when the matrix A is itself a product of two or more matrices. For simplicity, 
we will only consider the case when A is a product of two square matrices D and E (of the same order). Both 
of these restrictions can be removed but we focus on these special cases since our goal is to show how our 
framework gives a uniform way to measure the efficiency of our algorithm for matrix vector multiplication 
with structured matrices A. 

We begin with the Kronecker product. Given two no x n o matrices D and E, their Kronecker product 
A = D <g>E is defined as follows: for every (x 0 ,xi), ( 2 / 0 ,2/i) € A[(x 0 ,xi), ( 2 / 0 ,2/i)] = E[x 0 ,yo] ■ F[x 1 , 2 / 1 ]. 

In this case, we have the hypergraph "Hd^e = ({Xo, X\, Yo, Fi}, £d<8ie), where 

W: = {(Ab,FU (y 0 ,FU (Xo,y„)}. 

Next, we consider the Khatri-Rao product [55]. We think of E and F as collection of Uq matrices E X2,V2 
and F X2,V2 each of which are no x no matrices. The Khatri-Rao product A = E*F is defined as follows: for 
every (x 0 , Xi, x 2 ), (yo, 2/i, 2 / 2 ) G %l 0 , A[(x 0 , x x , x 2 ), {yo, 2/1, 2 / 2 )] = E X2 ' V2 [x 0 ,yo] • F X2 ’ y2 [xi, yi). In this case, 
we have the hypergraph He*f = ({X 0 , Xi, X 2 , Fo, Y x , F 2 }, £e*f), where 

£e*f = {{Xo,X 2 ,Y 0 ,Y 2 ),{X 1 ,X 2 ,Y 1 ,Y 2 ),(Yo,Y 1 ,Y 2 )}. 
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Finally, we consider the Tracy-Singh product [86]. We think of E and F as collection of n q matrices E 12 ® 
and F X3,V3 each of which are no x no matrices. The Tracy-Singh product A = EoF is defined as follows: for 
every (x 0 , Xi, x 2 , x 3 ), (y 0 , yi, y 2 , 2 / 3 ) € Z£ o , A[(x 0 ,x 1 ,X 2 ,x 3 ), (y 0 ,yi,y2,y3)] = E X2 ’ V2 [xo,yo] • F X3 ’ V3 [x\,y\}. 
In this case, we have the hypergraph "HeoF = ({Xo, X 3 , X 2 , X 3 , Yo, Y\, Y 2 ,Y 2 }, £eof), where 

£eof = {(X 0 , X 2 , Y 0 ,Y 2 ), (X 1 ,X 3 , Y 1 ,Y 3 ), (Yo,Y u Y 2 , Y 3 )}. 

We conclude by noting that if one is willing to go beyond semirings and look at fields (in particular, 
all the matrices above are over complex numbers), then one can do faster matrix vector multiplication for 
a wider family of matrices [71,76]. Since we are interested primarily in semirings in this paper, we do not 
further explore this connection. 

G.4 Representations of sparse tables in PGMs 

We now summarize the two main succinct representation of factors in PGMS (Example A.12): the first 
one corresponds to the listing representation (Definition 4.1) and the second one is similar to the decision 
diagram representation (Section G.3). 

Sparse tables. This is the listing representation: only list inputs y in representation of ips such that 
ips( y) 7 ^ 0. This is called evidence shrinking in [52]. 

Algebraic Decision Diagrams. Algebraic Decision Diagrams (or ADDs) were introduced in Bahar et 
al. [12]. This is a succinct representation that is very closely related to the decision diagrams from Section G.3, 
as we shall see shortly. (Bahar et al. defined ADDs, considered some of their basic properties, and analyzed 
(standard) algorithms for matrix multiplication, shortest path problems and linear algebra when the inputs 
for the problems were represented as ADDs.) 

We will present the definition of ADD for representing a single factor over bits (larger domain elements can 
be represented as bits and the generalization to multiple factors is straightforward). In particular consider 
the case when ip : {0,1}” —> D. An ADD representation for ip is a DAG G with the sinks (i.e. vertices in G 
with no outgoing edges) being labeled with values from D. Further, each non-sink node is labeled with one 
of the n variables such that it is consistent with an ordering er of the n variables. (By consistent we mean 
that if (u,v) is a directed edge and they are labeled £(u) and £(v), then a(£(u)) < a(£(v)).) Further, every 
non-sink node has two outgoing edges: one labeled 0 and the other 1. Given an ADD for a factor ip one can 
compute t/’( x ) f° r an input x = (xi,... ,x n ) in the natural way. Start with the root r and take the branch 
corresponding to x^u.) and continue recursively and stop when you reach a sink s. The value of ip(jx) is the 
label of s (recall that the sinks are labeled with elements from D). 

The ADD representation is very similar to the decision diagram representation presented in Section G.3. 
The main difference is that unlike the decision diagram case where the underlying DAG G is a layered graph, 
in an ADD representation G might not be a layered graph. In particular, if we put all vertices in G with 
the same label in a layer then there can be edges that connect layers that are not right next to each other. 
However, if we are willing to tolerate a blowup of 0(n) in the representation size, then we can indeed convert 
a G for any ADD representation into a layered graph G' as follows. For any edge ( Wi,Wj) between layer 
i and j (for j > i + l), 23 we replace the edge with a dummy path (wi,Wi+ 1 ), (wy+i, iOi+ 2 ), • •., 
where Wi+ 1 ,..., Wj -1 are new nodes in G'. Further, (wi,Wi+ 1 ) has the same label as (wi, Wj) while the rest 
of the new edges have a label of both 0 and 1 (more precisely, there are two parallel edges one labeled 0 and 
the other labeled 1). Finally, we add an (n+ l)th layer and connect all sinks to distinct vertices in this layer. 
(Again if the sink is in layer i for i < n, then we’ll actually need a ‘dummy’ path of appropriate length). Let 
L be the function that maps each vertex in the (n + l)th layer to the corresponding sink label. Given the 
ADD in the layered form G', one can represent ip as before (with suitable modifications since ip is no longer 

23 WLOG we assume that the ordering a is 1,..., n. 
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binary as in Section G.3): 


1p(x l,...,x n )= ^2 [n^Kyi-UXiiVi)) ■' l P {n+ 1 \yn,yn+l) ■ L(y n+1 ), 

y 0 ,...,y s +i \i=l / 

where encodes the edges between layer (i — 1) and layer i (for i £ [n + 1]). In the definition above these 
ll ;W are binary functions and we use the listing representation for each . 

If the blowup by a factor of n when going from the original ADD representation to the layered represen¬ 
tation is not acceptable, then there are potential ways to mitigate this blowup: 

1. If there are s sinks in total, then one can convert G into a layered graph G" with an additive blowup 
of 0(sn + n 3 ). To see this note that in the construction of G' above, each sink leads to a dummy path 
of length 0(n ), which justifies the O(sn) term. Next note that for every edge between layers i and j 
(for j > i + 1), the dummy path is the same except for the label on the first edge. In other words, all 
these dummy paths can have the same common suffix path of length j — i — 1. Note that there are 
0(n 3 ) distinct such suffixes, which justifies the 0{n 3 ) term. This additive factor is advantageous over 
the multiplicative blowup of n when G has many more vertices when compared to n and s. 

2. Alternatively, we can encode the ADD as an FAQ instance by essentially encoding the Boolean function 
suggested by the ADD (modulo the labeling of the sink). More formally, let r be the root of G and let 
co and c\ be its two outgoing neighbors. Then it is easy to see that 

IpG (Xl i • ■ • 5 *£n) = ^p£(r) (x#(r)) 1pG cl (*^£(r) + 15 * * * ) X n ) ~F ^ G CQ (*^£(r) +1) • * • ) X n ), 

where ^(r)(^(r)) = a^(r) and V^( r )(a^( r )) = 1 — a^( r ) and ipG^^Gcg are defined recursively. Finally, 
for a sink u , ^g u (x) = L{u). The advantage of this representation is that there is only an 0(1) blowup 
in the representation size, but now the size of the resulting FAQ instance is the same as the size of 
the original as opposed to our earlier representation where the size of the resulting FAQ instance only 
depends on n and s (but not on the size of G). 

Gogate and Domingos [39] present message passing algorithms that work with both the sparse table and 
ADD representation. Unlike this work, which focuses on exact computation of ip , the work of [39] presents 
algorithms that approximate ip. 
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